radeonsi: initial WIP SI code

This commit adds initial support for acceleration on SI chips. egltri is starting to work. The SI/R600 llvm backend is currently included in mesa but that may change in the future. The plan is to write a single gallium driver and use gallium to support X acceleration. This commit contains patches from: Tom Stellard <thomas.stellard@amd.com> Michel Dänzer <michel.daenzer@amd.com> Alex Deucher <alexander.deucher@amd.com> Vadim Girlin <vadimgirlin@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> The following commits were squashed in: ====================================================================== radeonsi: Remove unused winsys pointer This was removed from r600g in commit: commit 96d882939d Author: Marek Olšák <maraeo@gmail.com> Date: Fri Feb 17 01:49:49 2012 +0100 gallium: remove unused winsys pointers in pipe_screen and pipe_context A winsys is already a private object of a driver. ====================================================================== radeonsi: Copy color clamping CAPs from r600 Not sure if the values of these CAPS are correct for radeonsi, but the same changed were made to r600g in commit: commit bc1c836938 Author: Marek Olšák <maraeo@gmail.com> Date: Mon Jan 23 03:11:17 2012 +0100 st/mesa: do vertex and fragment color clamping in shaders For ARB_color_buffer_float. Most hardware can't do it and st/mesa is the perfect place for a fallback. The exceptions are: - r500 (vertex clamp only) - nv50 (both) - nvc0 (both) - softpipe (both) We also have to take into account that r300 can do CLAMPED vertex colors only, while r600 can do UNCLAMPED vertex colors only. The difference can be expressed with the two new CAPs. ====================================================================== radeonsi: Remove PIPE_CAP_OUTPUT_READ This CAP was dropped in commit: commit 04e3240087 Author: Marek Olšák <maraeo@gmail.com> Date: Thu Feb 23 23:44:36 2012 +0100 gallium: remove PIPE_SHADER_CAP_OUTPUT_READ r600g is the only driver which has made use of it. The reason the CAP was added was to fix some piglit tests when the GLSL pass lower_output_reads didn't exist. However, not removing output reads breaks the fallback for glClampColorARB, which assumes outputs are not readable. The fix would be non-trivial and my personal preference is to remove the CAP, considering that reading outputs is uncommon and that we can now use lower_output_reads to fix the issue that the CAP was supposed to workaround in the first place. ====================================================================== radeonsi: Add missing parameters to rws->buffer_get_tiling() call This was changed in commit: commit c0c979eebc Author: Jerome Glisse <jglisse@redhat.com> Date: Mon Jan 30 17:22:13 2012 -0500 r600g: add support for common surface allocator for tiling v13 Tiled surface have all kind of alignment constraint that needs to be met. Instead of having all this code duplicated btw ddx and mesa use common code in libdrm_radeon this also ensure that both ddx and mesa compute those alignment in the same way. v2 fix evergreen v3 fix compressed texture and workaround cube texture issue by disabling 2D array mode for cubemap (need to check if r7xx and newer are also affected by the issue) v4 fix texture array v5 fix evergreen and newer, split surface values computation from mipmap tree generation so that we can get them directly from the ddx v6 final fix to evergreen tile split value v7 fix mipmap offset to avoid to use random value, use color view depth view to address different layer as hardware is doing some magic rotation depending on the layer v8 fix COLOR_VIEW on r6xx for linear array mode, use COLOR_VIEW on evergreen, align bytes per pixel to a multiple of a dword v9 fix handling of stencil on evergreen, half fix for compressed texture v10 fix evergreen compressed texture proper support for stencil tile split. Fix stencil issue when array mode was clear by the kernel, always program stencil bo. On evergreen depth buffer bo need to be big enough to hold depth buffer + stencil buffer as even with stencil disabled things get written there. v11 rebase on top of mesa, fix pitch issue with 1d surface on evergreen, old ddx overestimate those. Fix linear case when pitch*height < 64. Fix r300g. v12 Fix linear case when pitch*height < 64 for old path, adapt to libdrm API change v13 add libdrm check Signed-off-by: Jerome Glisse <jglisse@redhat.com> ====================================================================== radeonsi: Remove PIPE_TRANSFER_MAP_PERMANENTLY This was removed in commit: commit 62f44f670b Author: Marek Olšák <maraeo@gmail.com> Date: Mon Mar 5 13:45:00 2012 +0100 Revert "gallium: add flag PIPE_TRANSFER_MAP_PERMANENTLY" This reverts commit 0950086376. It was decided to refactor the transfer API instead of adding workarounds to address the performance issues. ====================================================================== radeonsi: Handle PIPE_VIDEO_CAP_PREFERED_FORMAT. Reintroduced in commit 9d9afcb5ba. ====================================================================== radeonsi: nuke the fallback for vertex and fragment color clamping Ported from r600g commit c2b800cf38. ====================================================================== radeonsi: don't expose transform_feedback2 without kernel support Ported from r600g commit 15146fd1bc. ====================================================================== radeonsi: Handle PIPE_CAP_GLSL_FEATURE_LEVEL. Ported from r600g part of commit 171be75522. ====================================================================== radeonsi: set minimum point size to 1.0 for non-sprite non-aa points. Ported from r600g commit f183cc9ce3. ====================================================================== radeonsi: rework and consolidate stencilref state setting. Ported from r600g commit a2361946e7. ====================================================================== radeonsi: cleanup setting DB_SHADER_CONTROL. Ported from r600g commit 3d061caaed. ====================================================================== radeonsi: Get rid of register masks. Ported from r600g commits 3d061caaed13b646ff40754f8ebe73f3d4983c5b..9344ab382a1765c1a7c2560e771485edf4954fe2. ====================================================================== radeonsi: get rid of r600_context_reg. Ported from r600g commits 9344ab382a1765c1a7c2560e771485edf4954fe2..bed20f02a771f43e1c5092254705701c228cfa7f. ====================================================================== radeonsi: Fix regression from 'Get rid of register masks'. ====================================================================== radeonsi: optimize r600_resource_va. Ported from r600g commit 669d8766ff. ====================================================================== radeonsi: remove u8,u16,u32,u64 types. Ported from r600g commit 78293b99b2. ====================================================================== radeonsi: merge r600_context with r600_pipe_context. Ported from r600g commit e4340c1908. ====================================================================== radeonsi: Miscellaneous context cleanups. Ported from r600g commits e4340c1908a6a3b09e1a15d5195f6da7d00494d0..621e0db71c5ddcb379171064a4f720c9cf01e888. ====================================================================== radeonsi: add a new simple API for state emission. Ported from r600g commits 621e0db71c5ddcb379171064a4f720c9cf01e888..f661405637bba32c2cfbeecf6e2e56e414e9521e. ====================================================================== radeonsi: Also remove sbu_flags member of struct r600_reg. Requires using sid.h instead of r600d.h for the new CP_COHER_CNTL definitions, so some code needs to be disabled for now. ====================================================================== radeonsi: Miscellaneous simplifications. Ported from r600g commits 38bf276348 and b0337b679a. ====================================================================== radeonsi: Handle PIPE_CAP_QUADS_FOLLOW_PROVOKING_VERTEX_CONVENTION. Ported from commit 8b4f7b0672. ====================================================================== radeonsi: Use a fake reloc to sleep for fences. Ported from r600g commit 8cd03b933c. ====================================================================== radeonsi: adapt to get_query_result interface change. Ported from r600g commit 4445e170be.
2012-01-06 17:38:37 -05:00 · 2012-01-06 17:38:37 -05:00 · a75c6163e6
parent e55cf4854d
commit a75c6163e6
200 changed files with 66076 additions and 10 deletions
--- a/Android.mk
+++ b/Android.mk
@ -24,7 +24,7 @@
 # BOARD_GPU_DRIVERS should be defined.  The valid values are
 #
 #   classic drivers: i915 i965
-#   gallium drivers: swrast i915g nouveau r300g r600g vmwgfx
+#   gallium drivers: swrast i915g nouveau r300g r600g radeonsi vmwgfx
 #
 # The main target is libGLES_mesa.  For each classic driver enabled, a DRI
 # module will also be built.  DRI modules will be loaded by libGLES_mesa.
@ -37,7 +37,7 @@ DRM_TOP := external/drm
 DRM_GRALLOC_TOP := hardware/drm_gralloc

 classic_drivers := i915 i965
-gallium_drivers := swrast i915g nouveau r300g r600g vmwgfx
+gallium_drivers := swrast i915g nouveau r300g r600g radeonsi vmwgfx

 MESA_GPU_DRIVERS := $(strip $(BOARD_GPU_DRIVERS))

--- a/configs/autoconf.in
+++ b/configs/autoconf.in
@ -32,9 +32,12 @@ INTEL_LIBS = @INTEL_LIBS@
 INTEL_CFLAGS = @INTEL_CFLAGS@
 X11_LIBS = @X11_LIBS@
 X11_CFLAGS = @X11_CFLAGS@
+LLVM_BINDIR = @LLVM_BINDIR@
 LLVM_CFLAGS = @LLVM_CFLAGS@
+LLVM_CXXFLAGS = @LLVM_CXXFLAGS@
 LLVM_LDFLAGS = @LLVM_LDFLAGS@
 LLVM_LIBS = @LLVM_LIBS@
+LLVM_INCLUDEDIR = @LLVM_INCLUDEDIR@
 GLW_CFLAGS = @GLW_CFLAGS@
 GLX_TLS = @GLX_TLS@
 DRI_CFLAGS = @DRI_CFLAGS@
@ -58,6 +61,9 @@ AWK = @AWK@
 GREP = @GREP@
 NM = @NM@

+# Perl
+PERL = @PERL@
+
 # Python and flags (generally only needed by the developers)
 PYTHON2 = @PYTHON2@
 PYTHON_FLAGS = -t -O -O
--- a/configure.ac
+++ b/configure.ac
@ -67,6 +67,8 @@ if test ! -f "$srcdir/src/glsl/glcpp/glcpp-parse.y"; then
 fi
 AC_PROG_LEX

+AC_PATH_PROG([PERL], [perl])
+
 dnl Our fallback install-sh is a symlink to minstall. Use the existing
 dnl configuration in that case.
 AC_PROG_INSTALL
@ -1647,9 +1649,12 @@ if test "x$with_gallium_drivers" != x; then
    SRC_DIRS="$SRC_DIRS gallium gallium/winsys gallium/targets"
 fi

+AC_SUBST([LLVM_BINDIR])
 AC_SUBST([LLVM_CFLAGS])
+AC_SUBST([LLVM_CXXFLAGS])
 AC_SUBST([LLVM_LIBS])
 AC_SUBST([LLVM_LDFLAGS])
+AC_SUBST([LLVM_INCLUDEDIR])
 AC_SUBST([LLVM_VERSION])

 case "x$enable_opengl$enable_gles1$enable_gles2" in
@ -1795,6 +1800,9 @@ if test "x$enable_gallium_llvm" = xyes; then
 	    LLVM_LIBS="`$LLVM_CONFIG --libs engine bitwriter`"
 	fi
 	LLVM_LDFLAGS=`$LLVM_CONFIG --ldflags`
+	LLVM_BINDIR=`$LLVM_CONFIG --bindir`
+	LLVM_CXXFLAGS=`$LLVM_CONFIG --cxxflags`
+	LLVM_INCLUDEDIR=`$LLVM_CONFIG --includedir`
 	DEFINES="$DEFINES -D__STDC_CONSTANT_MACROS"
 	MESA_LLVM=1
    else
@ -1898,6 +1906,14 @@ if test "x$with_gallium_drivers" != x; then
            GALLIUM_DRIVERS_DIRS="$GALLIUM_DRIVERS_DIRS r600"
            gallium_check_st "radeon/drm" "dri-r600" "xorg-r600" "" "xvmc-r600" "vdpau-r600" "va-r600"
            ;;
+        xradeonsi)
+            GALLIUM_DRIVERS_DIRS="$GALLIUM_DRIVERS_DIRS radeonsi"
+            if test "x$LLVM_VERSION" != "x3.1"; then
+                AC_MSG_ERROR([LLVM 3.1 is required to build the radeonsi driver.])
+            fi
+	    NEED_RADEON_GALLIUM=yes;
+            gallium_check_st "radeon/drm" "dri-radeonsi" "xorg-radeonsi"
+            ;;
        xnouveau)
            PKG_CHECK_MODULES([NOUVEAU], [libdrm_nouveau >= $LIBDRM_NOUVEAU_REQUIRED])
            GALLIUM_DRIVERS_DIRS="$GALLIUM_DRIVERS_DIRS nouveau nvfx nv50 nvc0"
@ -1957,6 +1973,7 @@ done
 AM_CONDITIONAL(HAVE_GALAHAD_GALLIUM, test x$HAVE_GALAHAD_GALLIUM = xyes)
 AM_CONDITIONAL(HAVE_IDENTITY_GALLIUM, test x$HAVE_IDENTITY_GALLIUM = xyes)
 AM_CONDITIONAL(HAVE_NOOP_GALLIUM, test x$HAVE_NOOP_GALLIUM = xyes)
+AM_CONDITIONAL(NEED_RADEON_GALLIUM, test x$NEED_RADEON_GALLIUM = xyes)
 AC_SUBST([GALLIUM_MAKE_DIRS])

 dnl prepend CORE_DIRS to SRC_DIRS
--- a/include/pci_ids/pci_id_driver_map.h
+++ b/include/pci_ids/pci_id_driver_map.h
@ -45,6 +45,12 @@ static const int r600_chip_ids[] = {
 #undef CHIPSET
 };

+static const int radeonsi_chip_ids[] = {
+#define CHIPSET(chip, name, family) chip,
+#include "pci_ids/radeonsi_pci_ids.h"
+#undef CHIPSET
+};
+
 static const int vmwgfx_chip_ids[] = {
 #define CHIPSET(chip, name, family) chip,
 #include "pci_ids/vmwgfx_pci_ids.h"
@ -65,6 +71,7 @@ static const struct {
 #endif
   { 0x1002, "r300", r300_chip_ids, ARRAY_SIZE(r300_chip_ids) },
   { 0x1002, "r600", r600_chip_ids, ARRAY_SIZE(r600_chip_ids) },
+   { 0x1002, "radeonsi", radeonsi_chip_ids, ARRAY_SIZE(radeonsi_chip_ids) },
   { 0x10de, "nouveau", NULL, -1 },
   { 0x15ad, "vmwgfx", vmwgfx_chip_ids, ARRAY_SIZE(vmwgfx_chip_ids) },
   { 0x0000, NULL, NULL, 0 },
--- a/include/pci_ids/radeonsi_pci_ids.h
+++ b/include/pci_ids/radeonsi_pci_ids.h
@ -0,0 +1,40 @@
+CHIPSET(0x6780, TAHITI_6780, TAHITI)
+CHIPSET(0x6784, TAHITI_6784, TAHITI)
+CHIPSET(0x6788, TAHITI_678A, TAHITI)
+CHIPSET(0x678A, TAHITI_678A, TAHITI)
+CHIPSET(0x6790, TAHITI_6790, TAHITI)
+CHIPSET(0x6798, TAHITI_6798, TAHITI)
+CHIPSET(0x6799, TAHITI_6799, TAHITI)
+CHIPSET(0x679A, TAHITI_679E, TAHITI)
+CHIPSET(0x679E, TAHITI_679E, TAHITI)
+CHIPSET(0x679F, TAHITI_679F, TAHITI)
+
+CHIPSET(0x6800, PITCAIRN_6800, PITCAIRN)
+CHIPSET(0x6801, PITCAIRN_6801, PITCAIRN)
+CHIPSET(0x6802, PITCAIRN_6802, PITCAIRN)
+CHIPSET(0x6808, PITCAIRN_6808, PITCAIRN)
+CHIPSET(0x6809, PITCAIRN_6809, PITCAIRN)
+CHIPSET(0x6810, PITCAIRN_6810, PITCAIRN)
+CHIPSET(0x6818, PITCAIRN_6818, PITCAIRN)
+CHIPSET(0x6819, PITCAIRN_6819, PITCAIRN)
+CHIPSET(0x684C, PITCAIRN_684C, PITCAIRN)
+
+CHIPSET(0x6820, VERDE_6820, VERDE)
+CHIPSET(0x6821, VERDE_6821, VERDE)
+CHIPSET(0x6823, VERDE_6824, VERDE)
+CHIPSET(0x6824, VERDE_6824, VERDE)
+CHIPSET(0x6825, VERDE_6825, VERDE)
+CHIPSET(0x6826, VERDE_6825, VERDE)
+CHIPSET(0x6827, VERDE_6827, VERDE)
+CHIPSET(0x6828, VERDE_6828, VERDE)
+CHIPSET(0x6829, VERDE_6829, VERDE)
+CHIPSET(0x682D, VERDE_682D, VERDE)
+CHIPSET(0x682F, VERDE_682F, VERDE)
+CHIPSET(0x6830, VERDE_6830, VERDE)
+CHIPSET(0x6831, VERDE_6831, VERDE)
+CHIPSET(0x6837, VERDE_6831, VERDE)
+CHIPSET(0x6838, VERDE_6838, VERDE)
+CHIPSET(0x6839, VERDE_6839, VERDE)
+CHIPSET(0x683B, VERDE_683B, VERDE)
+CHIPSET(0x683D, VERDE_683D, VERDE)
+CHIPSET(0x683F, VERDE_683F, VERDE)
--- a/src/egl/main/Android.mk
+++ b/src/egl/main/Android.mk
@ -107,8 +107,8 @@ gallium_DRIVERS += \
 LOCAL_SHARED_LIBRARIES += libdrm_nouveau
 endif

-# r300g/r600g
-ifneq ($(filter r300g r600g, $(MESA_GPU_DRIVERS)),)
+# r300g/r600g/radeonsi
+ifneq ($(filter r300g r600g radeonsi, $(MESA_GPU_DRIVERS)),)
 gallium_DRIVERS += libmesa_winsys_radeon
 ifneq ($(filter r300g, $(MESA_GPU_DRIVERS)),)
 gallium_DRIVERS += libmesa_pipe_r300
@ -116,6 +116,9 @@ endif
 ifneq ($(filter r600g, $(MESA_GPU_DRIVERS)),)
 gallium_DRIVERS += libmesa_pipe_r600
 endif
+ifneq ($(filter radeonsi, $(MESA_GPU_DRIVERS)),)
+gallium_DRIVERS += libmesa_pipe_radeonsi
+endif
 endif

 # vmwgfx
--- a/src/gallium/Android.mk
+++ b/src/gallium/Android.mk
@ -49,8 +49,8 @@ SUBDIRS += \
 	drivers/nvc0
 endif

-# r300g/r600g
-ifneq ($(filter r300g r600g, $(MESA_GPU_DRIVERS)),)
+# r300g/r600g/radeonsi
+ifneq ($(filter r300g r600g radeonsi, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += winsys/radeon/drm
 ifneq ($(filter r300g, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += drivers/r300
@ -58,6 +58,9 @@ endif
 ifneq ($(filter r600g, $(MESA_GPU_DRIVERS)),)
 SUBDIRS += drivers/r600
 endif
+ifneq ($(filter radeonsi, $(MESA_GPU_DRIVERS)),)
+SUBDIRS += drivers/radeonsi
+endif
 endif

 # vmwgfx
--- a/src/gallium/SConscript
+++ b/src/gallium/SConscript
@ -33,6 +33,7 @@ if env['drm']:
        SConscript([
            'drivers/r300/SConscript',
            'drivers/r600/SConscript',
+            'drivers/radeonsi/SConscript',
        ])
    # XXX: nouveau drivers have a tight dependency on libdrm, so to enable
    # we need some version logic before we enable them. Also, ATM there is
@ -152,6 +153,7 @@ if not env['embedded']:
            SConscript([
                'targets/dri-r300/SConscript',
                'targets/dri-r600/SConscript',
+                'targets/dri-radeonsi/SConscript',
            ])

    if env['xorg'] and env['drm']:
--- a/src/gallium/drivers/Makefile.am
+++ b/src/gallium/drivers/Makefile.am
@ -10,6 +10,8 @@ AM_CPPFLAGS = \

 noinst_LIBRARIES =

+SUBDIRS =
+
 ################################################################################

 if HAVE_GALAHAD_GALLIUM
@ -52,7 +54,16 @@ noop_libnoop_a_SOURCES = \
 endif

 ################################################################################
-SUBDIRS = $(GALLIUM_MAKE_DIRS)
+
+if NEED_RADEON_GALLIUM
+
+SUBDIRS+= radeon
+
+endif
+
+################################################################################
+
+SUBDIRS+= $(GALLIUM_MAKE_DIRS)

 # FIXME: Remove when the rest of Gallium is converted to automake.
 default: all
--- a/src/gallium/drivers/radeon/AMDGPU.h
+++ b/src/gallium/drivers/radeon/AMDGPU.h
@ -0,0 +1,47 @@
+//===-- AMDGPU.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDGPU_H
+#define AMDGPU_H
+
+#include "AMDGPUTargetMachine.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Target/TargetMachine.h"
+
+namespace llvm {
+    class FunctionPass;
+    class AMDGPUTargetMachine;
+
+    FunctionPass *createR600CodeEmitterPass(formatted_raw_ostream &OS);
+    FunctionPass *createR600LowerShaderInstructionsPass(TargetMachine &tm);
+    FunctionPass *createR600LowerInstructionsPass(TargetMachine &tm);
+
+    FunctionPass *createSIAssignInterpRegsPass(TargetMachine &tm);
+    FunctionPass *createSIConvertToISAPass(TargetMachine &tm);
+    FunctionPass *createSIInitMachineFunctionInfoPass(TargetMachine &tm);
+    FunctionPass *createSILowerShaderInstructionsPass(TargetMachine &tm);
+    FunctionPass *createSIPropagateImmReadsPass(TargetMachine &tm);
+    FunctionPass *createSICodeEmitterPass(formatted_raw_ostream &OS);
+
+    FunctionPass *createAMDGPUReorderPreloadInstructionsPass(TargetMachine &tm);
+
+    FunctionPass *createAMDGPULowerShaderInstructionsPass(TargetMachine &tm);
+
+    FunctionPass *createAMDGPUDelimitInstGroupsPass(TargetMachine &tm);
+
+    FunctionPass *createAMDGPUConvertToISAPass(TargetMachine &tm);
+
+    FunctionPass *createAMDGPUFixRegClassesPass(TargetMachine &tm);
+
+} /* End namespace llvm */
+#endif /* AMDGPU_H */
--- a/src/gallium/drivers/radeon/AMDGPUConstants.pm
+++ b/src/gallium/drivers/radeon/AMDGPUConstants.pm
@ -0,0 +1,44 @@
+#===-- AMDGPUConstants.pm - TODO: Add brief description -------===#
+#
+#                     The LLVM Compiler Infrastructure
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+#
+#===----------------------------------------------------------------------===#
+#
+# TODO: Add full description
+#
+#===----------------------------------------------------------------------===#
+
+package AMDGPUConstants;
+
+use base 'Exporter';
+
+use constant CONST_REG_COUNT => 256;
+use constant TEMP_REG_COUNT => 128;
+
+our @EXPORT = ('TEMP_REG_COUNT', 'CONST_REG_COUNT', 'get_hw_index', 'get_chan_str');
+
+sub get_hw_index {
+  my ($index) = @_;
+  return int($index / 4);
+}
+
+sub get_chan_str {
+  my ($index) = @_;
+  my $chan = $index % 4;
+  if ($chan == 0 )  {
+    return 'X';
+  } elsif ($chan == 1) {
+    return 'Y';
+  } elsif ($chan == 2) {
+    return 'Z';
+  } elsif ($chan == 3) {
+    return 'W';
+  } else {
+    die("Unknown chan value: $chan");
+  }
+}
+
+1;
--- a/src/gallium/drivers/radeon/AMDGPUConvertToISA.cpp
+++ b/src/gallium/drivers/radeon/AMDGPUConvertToISA.cpp
@ -0,0 +1,65 @@
+//===-- AMDGPUConvertToISA.cpp - Lower AMDIL to HW ISA --------------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass lowers AMDIL machine instructions to the appropriate hardware
+// instructions. 
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDGPUInstrInfo.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+
+using namespace llvm;
+
+namespace {
+  class AMDGPUConvertToISAPass : public MachineFunctionPass {
+
+  private:
+    static char ID;
+    TargetMachine &TM;
+
+    void lowerFLT(MachineInstr &MI);
+
+  public:
+    AMDGPUConvertToISAPass(TargetMachine &tm) :
+      MachineFunctionPass(ID), TM(tm) { }
+
+    virtual bool runOnMachineFunction(MachineFunction &MF);
+
+  };
+} /* End anonymous namespace */
+
+char AMDGPUConvertToISAPass::ID = 0;
+
+FunctionPass *llvm::createAMDGPUConvertToISAPass(TargetMachine &tm) {
+  return new AMDGPUConvertToISAPass(tm);
+}
+
+bool AMDGPUConvertToISAPass::runOnMachineFunction(MachineFunction &MF)
+{
+  const AMDGPUInstrInfo * TII =
+                      static_cast<const AMDGPUInstrInfo*>(TM.getInstrInfo());
+
+  for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
+                                                  BB != BB_E; ++BB) {
+    MachineBasicBlock &MBB = *BB;
+    for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
+         I != MBB.end(); I = Next, Next = llvm::next(I) ) {
+      MachineInstr &MI = *I;
+      MachineInstr * newInstr = TII->convertToISA(MI, MF, MBB.findDebugLoc(I));
+      if (!newInstr) {
+        continue;
+      }
+      MBB.insert(I, newInstr);
+      MI.eraseFromParent();
+    }
+  }
+  return false;
+}
--- a/src/gallium/drivers/radeon/AMDGPUGenInstrEnums.pl
+++ b/src/gallium/drivers/radeon/AMDGPUGenInstrEnums.pl
@ -0,0 +1,126 @@
+#===-- AMDGPUGenInstrEnums.pl - TODO: Add brief description -------===#
+#
+#                     The LLVM Compiler Infrastructure
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+#
+#===----------------------------------------------------------------------===#
+#
+# TODO: Add full description
+#
+#===----------------------------------------------------------------------===#
+
+use warnings;
+use strict;
+
+my @F32_MULTICLASSES = qw {
+  UnaryIntrinsicFloat
+  UnaryIntrinsicFloatScalar
+  BinaryIntrinsicFloat
+  TernaryIntrinsicFloat
+  BinaryOpMCFloat
+};
+
+my @I32_MULTICLASSES = qw {
+  BinaryOpMCInt
+  BinaryOpMCi32
+  BinaryOpMCi32Const
+};
+
+my @GENERATION_ENUM = qw {
+  R600_CAYMAN
+  R600
+  EG
+  EG_CAYMAN
+  CAYMAN
+  SI
+};
+
+my $FILE_TYPE = $ARGV[0];
+
+open AMDIL, '<', 'AMDILInstructions.td';
+
+my @INST_ENUMS = ('NONE', 'FEQ', 'FGE', 'FLT', 'FNE', 'MOVE_f32', 'MOVE_i32', 'FTOI', 'ITOF', 'CMOVLOG_f32', 'UGT', 'IGE', 'INE', 'UGE', 'IEQ');
+
+while (<AMDIL>) {
+  if ($_ =~ /defm\s+([A-Z_]+)\s+:\s+([A-Za-z0-9]+)</) {
+    if (grep {$_ eq $2} @F32_MULTICLASSES) {
+      push @INST_ENUMS, "$1\_f32";
+
+    } elsif (grep {$_ eq $2} @I32_MULTICLASSES) {
+      push @INST_ENUMS, "$1\_i32";
+    }
+  } elsif ($_ =~ /def\s+([A-Z_]+)(_[fi]32)/) {
+    push @INST_ENUMS, "$1$2";
+  }
+}
+
+if ($FILE_TYPE eq 'td') {
+
+  print_td_enum('AMDILInst', 'AMDILInstEnums', 'field bits<16>', @INST_ENUMS);
+
+  print_td_enum('AMDGPUGen', 'AMDGPUGenEnums', 'field bits<3>', @GENERATION_ENUM);
+
+  my %constants = (
+    'PI' =>      '0x40490fdb',
+    'TWO_PI' =>     '0x40c90fdb',
+    'TWO_PI_INV' => '0x3e22f983'
+  );
+
+  print "class Constants {\n";
+  foreach (keys(%constants)) {
+    print "int $_ = $constants{$_};\n";
+  }
+  print "}\n";
+  print "def CONST : Constants;\n";
+
+} elsif ($FILE_TYPE eq 'h') {
+
+  print "unsigned GetRealAMDILOpcode(unsigned internalOpcode) const;\n";
+
+  print_h_enum('AMDILTblgenOpcode', @INST_ENUMS);
+
+  print_h_enum('AMDGPUGen', @GENERATION_ENUM);
+
+} elsif ($FILE_TYPE eq 'inc') {
+  print "unsigned AMDGPUInstrInfo::GetRealAMDILOpcode(unsigned internalOpcode) const\n{\n";
+  print "  switch(internalOpcode) {\n";
+  #Start at 1 so we skip NONE
+  for (my $i = 1; $i < scalar(@INST_ENUMS); $i++) {
+    my $inst = $INST_ENUMS[$i];
+    print "  case AMDGPUInstrInfo::$inst: return AMDIL::$inst;\n";
+  }
+  print "  default: abort();\n";
+  print "  }\n}\n";
+}
+
+
+sub print_td_enum {
+  my ($instance, $class, $field, @values) = @_;
+
+  print "class $class {\n";
+
+  for (my $i = 0; $i < scalar(@values); $i++) {
+    print "  $field $values[$i] = $i;\n";
+  }
+  print "}\n";
+
+  print "def $instance : $class;\n";
+}
+
+sub print_h_enum {
+
+  my ($enum, @list) = @_;
+  print "enum $enum {\n";
+
+  for (my $i = 0; $i < scalar(@list); $i++) {
+    print "  $list[$i] = $i";
+    if ($i != $#list) {
+      print ',';
+    }
+    print "\n";
+  }
+  print "};\n";
+}
+
--- a/src/gallium/drivers/radeon/AMDGPUGenShaderPatterns.pl
+++ b/src/gallium/drivers/radeon/AMDGPUGenShaderPatterns.pl
@ -0,0 +1,30 @@
+#===-- AMDGPUGenShaderPatterns.pl - TODO: Add brief description -------===#
+#
+#                     The LLVM Compiler Infrastructure
+#
+# This file is distributed under the University of Illinois Open Source
+# License. See LICENSE.TXT for details.
+#
+#===----------------------------------------------------------------------===#
+#
+# TODO: Add full description
+#
+#===----------------------------------------------------------------------===#
+
+use strict;
+use warnings;
+
+use AMDGPUConstants;
+
+my $reg_prefix = $ARGV[0];
+
+for (my $i = 0; $i < CONST_REG_COUNT * 4; $i++) {
+  my $index = get_hw_index($i);
+  my $chan = get_chan_str($i);
+print <<STRING;
+def : Pat <
+  (int_AMDGPU_load_const $i),
+  (f32 (MOV (f32 $reg_prefix$index\_$chan)))
+>;
+STRING
+}
--- a/src/gallium/drivers/radeon/AMDGPUISelLowering.cpp
+++ b/src/gallium/drivers/radeon/AMDGPUISelLowering.cpp
@ -0,0 +1,31 @@
+//===-- AMDGPUISelLowering.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUISelLowering.h"
+#include "AMDGPUUtil.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+
+using namespace llvm;
+
+AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
+  AMDILTargetLowering(TM)
+{
+}
+
+void AMDGPUTargetLowering::addLiveIn(MachineInstr * MI,
+    MachineFunction * MF, MachineRegisterInfo & MRI,
+    const struct TargetInstrInfo * TII, unsigned reg) const
+{
+  AMDGPU::utilAddLiveIn(MF, MRI, TII, reg, MI->getOperand(0).getReg()); 
+}
+
--- a/src/gallium/drivers/radeon/AMDGPUISelLowering.h
+++ b/src/gallium/drivers/radeon/AMDGPUISelLowering.h
@ -0,0 +1,35 @@
+//===-- AMDGPUISelLowering.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDGPUISELLOWERING_H
+#define AMDGPUISELLOWERING_H
+
+#include "AMDILISelLowering.h"
+
+namespace llvm {
+
+class AMDGPUTargetLowering : public AMDILTargetLowering
+{
+protected:
+  void addLiveIn(MachineInstr * MI, MachineFunction * MF,
+                 MachineRegisterInfo & MRI, const struct TargetInstrInfo * TII,
+		 unsigned reg) const;
+
+public:
+  AMDGPUTargetLowering(TargetMachine &TM);
+
+};
+
+} /* End namespace llvm */
+
+#endif /* AMDGPUISELLOWERING_H */
--- a/src/gallium/drivers/radeon/AMDGPUInstrInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDGPUInstrInfo.cpp
@ -0,0 +1,116 @@
+//===-- AMDGPUInstrInfo.cpp - Base class for AMD GPU InstrInfo ------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file contains the implementation of the TargetInstrInfo class that is
+// common to all AMD GPUs.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUInstrInfo.h"
+#include "AMDGPURegisterInfo.h"
+#include "AMDGPUTargetMachine.h"
+#include "AMDIL.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+
+using namespace llvm;
+
+AMDGPUInstrInfo::AMDGPUInstrInfo(AMDGPUTargetMachine &tm)
+  : AMDILInstrInfo(tm), TM(tm)
+{
+  const AMDILDevice * dev = TM.getSubtarget<AMDILSubtarget>().device();
+  for (unsigned i = 0; i < AMDIL::INSTRUCTION_LIST_END; i++) {
+    const MCInstrDesc & instDesc = get(i);
+    uint32_t instGen = (instDesc.TSFlags >> 40) & 0x7;
+    uint32_t inst = (instDesc.TSFlags >>  48) & 0xffff;
+    if (inst == 0) {
+      continue;
+    }
+    switch (instGen) {
+    case AMDGPUInstrInfo::R600_CAYMAN:
+      if (dev->getGeneration() > AMDILDeviceInfo::HD6XXX) {
+        continue;
+      }
+      break;
+    case AMDGPUInstrInfo::R600:
+      if (dev->getGeneration() != AMDILDeviceInfo::HD4XXX) {
+        continue;
+      }
+      break;
+    case AMDGPUInstrInfo::EG_CAYMAN:
+      if (dev->getGeneration() < AMDILDeviceInfo::HD5XXX
+          || dev->getGeneration() > AMDILDeviceInfo::HD6XXX) {
+        continue;
+      }
+      break;
+    case AMDGPUInstrInfo::CAYMAN:
+      if (dev->getDeviceFlag() != OCL_DEVICE_CAYMAN) {
+        continue;
+      }
+      break;
+    case AMDGPUInstrInfo::SI:
+      if (dev->getGeneration() != AMDILDeviceInfo::HD7XXX) {
+        continue;
+      }
+      break;
+    default:
+      abort();
+      break;
+    }
+
+    unsigned amdilOpcode = GetRealAMDILOpcode(inst);
+    amdilToISA[amdilOpcode] = instDesc.Opcode;
+  }
+}
+
+MachineInstr * AMDGPUInstrInfo::convertToISA(MachineInstr & MI, MachineFunction &MF,
+    DebugLoc DL) const
+{
+  MachineInstrBuilder newInstr;
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  const AMDGPURegisterInfo & RI = getRegisterInfo();
+  unsigned ISAOpcode = getISAOpcode(MI.getOpcode());
+
+  /* Create the new instruction */
+  newInstr = BuildMI(MF, DL, TM.getInstrInfo()->get(ISAOpcode));
+
+  for (unsigned i = 0; i < MI.getNumOperands(); i++) {
+    MachineOperand &MO = MI.getOperand(i);
+    /* Convert dst regclass to one that is supported by the ISA */
+    if (MO.isReg() && MO.isDef()) {
+      if (TargetRegisterInfo::isVirtualRegister(MO.getReg())) {
+        const TargetRegisterClass * oldRegClass = MRI.getRegClass(MO.getReg());
+        const TargetRegisterClass * newRegClass = RI.getISARegClass(oldRegClass);
+
+        assert(newRegClass);
+
+        MRI.setRegClass(MO.getReg(), newRegClass);
+      }
+    }
+    /* Add the operand to the new instruction */
+    newInstr.addOperand(MO);
+  }
+
+  return newInstr;
+}
+
+unsigned AMDGPUInstrInfo::getISAOpcode(unsigned opcode) const
+{
+  if (amdilToISA.count(opcode) == 0) {
+    return opcode;
+  } else {
+    return amdilToISA.find(opcode)->second;
+  }
+}
+
+bool AMDGPUInstrInfo::isRegPreload(const MachineInstr &MI) const
+{
+  return (get(MI.getOpcode()).TSFlags >> AMDGPU_TFLAG_SHIFTS::PRELOAD_REG) & 0x1;
+}
+
+#include "AMDGPUInstrEnums.include"
--- a/src/gallium/drivers/radeon/AMDGPUInstrInfo.h
+++ b/src/gallium/drivers/radeon/AMDGPUInstrInfo.h
@ -0,0 +1,59 @@
+//===-- AMDGPUInstrInfo.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDGPUINSTRUCTIONINFO_H_
+#define AMDGPUINSTRUCTIONINFO_H_
+
+#include "AMDGPURegisterInfo.h"
+#include "AMDILInstrInfo.h"
+
+#include <map>
+
+namespace llvm {
+
+  class AMDGPUTargetMachine;
+  class MachineFunction;
+  class MachineInstr;
+  class MachineInstrBuilder;
+
+  class AMDGPUInstrInfo : public AMDILInstrInfo {
+  private:
+  AMDGPUTargetMachine & TM;
+  std::map<unsigned, unsigned> amdilToISA;
+
+  public:
+  explicit AMDGPUInstrInfo(AMDGPUTargetMachine &tm);
+
+  virtual const AMDGPURegisterInfo &getRegisterInfo() const = 0;
+
+  virtual unsigned getISAOpcode(unsigned AMDILopcode) const;
+
+  virtual MachineInstr * convertToISA(MachineInstr & MI, MachineFunction &MF,
+    DebugLoc DL) const;
+
+  bool isRegPreload(const MachineInstr &MI) const;
+
+  #include "AMDGPUInstrEnums.h.include"
+  };
+
+} // End llvm namespace
+
+/* AMDGPU target flags are stored in bits 32-39 */
+namespace AMDGPU_TFLAG_SHIFTS {
+  enum TFLAGS {
+    PRELOAD_REG = 32
+  };
+}
+
+
+#endif // AMDGPUINSTRINFO_H_
--- a/src/gallium/drivers/radeon/AMDGPUInstructions.td
+++ b/src/gallium/drivers/radeon/AMDGPUInstructions.td
@ -0,0 +1,90 @@
+//===-- AMDGPUInstructions.td - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+include "AMDGPUInstrEnums.td"
+
+class AMDGPUInst <dag outs, dag ins, string asm, list<dag> pattern> : Instruction {
+  field bits<16> AMDILOp = 0;
+  field bits<3> Gen = 0;
+  field bit PreloadReg = 0;
+
+  let Namespace = "AMDIL";
+  let OutOperandList = outs;
+  let InOperandList = ins;
+  let AsmString = asm;
+  let Pattern = pattern;
+  let TSFlags{32} = PreloadReg;
+  let TSFlags{42-40} = Gen;
+  let TSFlags{63-48} = AMDILOp;
+}
+
+class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
+    : AMDGPUInst<outs, ins, asm, pattern> {
+
+  field bits<32> Inst = 0xffffffff;
+
+}
+
+let isCodeGenOnly = 1 in {
+
+  def EXPORT_REG : AMDGPUShaderInst <
+    (outs),
+    (ins GPRF32:$src),
+    "EXPORT_REG $src",
+    [(int_AMDGPU_export_reg GPRF32:$src)]
+  >;
+
+  def LOAD_INPUT : AMDGPUShaderInst <
+    (outs GPRF32:$dst),
+    (ins i32imm:$src),
+    "LOAD_INPUT $dst, $src",
+    [] >{
+    let PreloadReg = 1;
+  }
+
+  def MASK_WRITE : AMDGPUShaderInst <
+    (outs),
+    (ins GPRF32:$src),
+    "MASK_WRITE $src",
+    []
+  >;
+
+  def RESERVE_REG : AMDGPUShaderInst <
+    (outs GPRF32:$dst),
+    (ins i32imm:$src),
+    "RESERVE_REG $dst, $src",
+    [(set GPRF32:$dst, (int_AMDGPU_reserve_reg imm:$src))]> {
+    let PreloadReg = 1;
+  }
+
+  def STORE_OUTPUT: AMDGPUShaderInst <
+    (outs GPRF32:$dst),
+    (ins GPRF32:$src0, i32imm:$src1),
+    "STORE_OUTPUT $dst, $src0, $src1",
+    [(set GPRF32:$dst, (int_AMDGPU_store_output GPRF32:$src0, imm:$src1))]
+  >;
+}
+
+/* Generic helper patterns for intrinsics */
+/* -------------------------------------- */
+
+class POW_Common <AMDGPUInst log_ieee, AMDGPUInst exp_ieee, AMDGPUInst mul,
+                  RegisterClass rc> : Pat <
+  (int_AMDGPU_pow rc:$src0, rc:$src1),
+  (exp_ieee (mul rc:$src1, (log_ieee rc:$src0)))
+>;
+
+include "R600Instructions.td"
+
+include "SIInstrInfo.td"
+
--- a/src/gallium/drivers/radeon/AMDGPUIntrinsics.td
+++ b/src/gallium/drivers/radeon/AMDGPUIntrinsics.td
@ -0,0 +1,56 @@
+//===-- AMDGPUIntrinsics.td - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+let TargetPrefix = "AMDGPU", isTarget = 1 in {
+
+  def int_AMDGPU_export_reg : Intrinsic<[], [llvm_float_ty], []>;
+  def int_AMDGPU_load_const : Intrinsic<[llvm_float_ty], [llvm_i32_ty], []>;
+  def int_AMDGPU_load_imm : Intrinsic<[llvm_v4f32_ty], [llvm_i32_ty], []>;
+  def int_AMDGPU_reserve_reg : Intrinsic<[llvm_float_ty], [llvm_i32_ty], []>;
+  def int_AMDGPU_store_output : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_i32_ty], []>;
+  def int_AMDGPU_swizzle : Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_i32_ty], []>;
+
+  def int_AMDGPU_arl : Intrinsic<[llvm_i32_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_cndlt : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_cos : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_div : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_dp4 : Intrinsic<[llvm_float_ty], [llvm_v4f32_ty, llvm_v4f32_ty], []>;
+  def int_AMDGPU_floor : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_kill : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_kilp : Intrinsic<[], [], []>;
+  def int_AMDGPU_lrp : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_mul : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_pow : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_rcp : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_rsq : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_seq : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_sgt : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_sge : BinaryIntFloat;
+  def int_AMDGPU_sin : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_sle : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_sne : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_ssg : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+  def int_AMDGPU_mullit : Intrinsic<[llvm_v4f32_ty], [llvm_float_ty, llvm_float_ty, llvm_float_ty], []>;
+  def int_AMDGPU_tex : Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
+  def int_AMDGPU_txb : Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
+  def int_AMDGPU_txd : Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
+  def int_AMDGPU_txl : Intrinsic<[llvm_v4f32_ty], [llvm_v4f32_ty, llvm_i32_ty, llvm_i32_ty], []>;
+  def int_AMDGPU_trunc : Intrinsic<[llvm_float_ty], [llvm_float_ty], []>;
+}
+
+let TargetPrefix = "TGSI", isTarget = 1 in {
+
+  def int_TGSI_lit_z : Intrinsic<[llvm_float_ty], [llvm_float_ty, llvm_float_ty, llvm_float_ty],[]>;
+}
+
+include "SIIntrinsics.td"
--- a/src/gallium/drivers/radeon/AMDGPULowerShaderInstructions.cpp
+++ b/src/gallium/drivers/radeon/AMDGPULowerShaderInstructions.cpp
@ -0,0 +1,38 @@
+//===-- AMDGPULowerShaderInstructions.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+
+#include "AMDGPULowerShaderInstructions.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Target/TargetInstrInfo.h"
+
+using namespace llvm;
+
+void AMDGPULowerShaderInstructionsPass::preloadRegister(MachineFunction * MF,
+    const TargetInstrInfo * TII, unsigned physReg, unsigned virtReg) const
+{
+  if (!MRI->isLiveIn(physReg)) {
+    MRI->addLiveIn(physReg, virtReg);
+    MachineBasicBlock &EntryMBB = MF->front();
+    BuildMI(MF->front(), EntryMBB.begin(), DebugLoc(), TII->get(TargetOpcode::COPY),
+            virtReg)
+            .addReg(physReg);
+  } else {
+    /* We can't mark the same register as preloaded twice, but we still must
+     * associate virtReg with the correct preloaded register. */
+    unsigned newReg = MRI->getLiveInVirtReg(physReg);
+    MRI->replaceRegWith(virtReg, newReg);
+  }
+}
--- a/src/gallium/drivers/radeon/AMDGPULowerShaderInstructions.h
+++ b/src/gallium/drivers/radeon/AMDGPULowerShaderInstructions.h
@ -0,0 +1,40 @@
+//===-- AMDGPULowerShaderInstructions.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+
+#ifndef AMDGPU_LOWER_SHADER_INSTRUCTIONS
+#define AMDGPU_LOWER_SHADER_INSTRUCTIONS
+
+namespace llvm {
+
+class MachineFunction;
+class MachineRegisterInfo;
+class TargetInstrInfo;
+
+class AMDGPULowerShaderInstructionsPass {
+
+  protected:
+    MachineRegisterInfo * MRI;
+    /**
+     * @param physReg The physical register that will be preloaded.
+     * @param virtReg The virtual register that currently holds the
+     *                preloaded value.
+     */
+    void preloadRegister(MachineFunction * MF, const TargetInstrInfo * TII,
+                         unsigned physReg, unsigned virtReg) const;
+};
+
+} // end namespace llvm
+
+
+#endif // AMDGPU_LOWER_SHADER_INSTRUCTIONS
--- a/src/gallium/drivers/radeon/AMDGPURegisterInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDGPURegisterInfo.cpp
@ -0,0 +1,24 @@
+//===-- AMDGPURegisterInfo.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPURegisterInfo.h"
+#include "AMDGPUTargetMachine.h"
+
+using namespace llvm;
+
+AMDGPURegisterInfo::AMDGPURegisterInfo(AMDGPUTargetMachine &tm,
+    const TargetInstrInfo &tii)
+: AMDILRegisterInfo(tm, tii),
+  TM(tm),
+  TII(tii)
+  { }
--- a/src/gallium/drivers/radeon/AMDGPURegisterInfo.h
+++ b/src/gallium/drivers/radeon/AMDGPURegisterInfo.h
@ -0,0 +1,38 @@
+//===-- AMDGPURegisterInfo.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDGPUREGISTERINFO_H_
+#define AMDGPUREGISTERINFO_H_
+
+#include "AMDILRegisterInfo.h"
+
+namespace llvm {
+
+  class AMDGPUTargetMachine;
+  class TargetInstrInfo;
+
+  struct AMDGPURegisterInfo : public AMDILRegisterInfo
+  {
+    AMDGPUTargetMachine &TM;
+    const TargetInstrInfo &TII;
+
+    AMDGPURegisterInfo(AMDGPUTargetMachine &tm, const TargetInstrInfo &tii);
+
+    virtual BitVector getReservedRegs(const MachineFunction &MF) const = 0;
+
+    virtual const TargetRegisterClass *
+    getISARegClass(const TargetRegisterClass * rc) const = 0;
+  };
+} // End namespace llvm
+
+#endif // AMDIDSAREGISTERINFO_H_
--- a/src/gallium/drivers/radeon/AMDGPURegisterInfo.td
+++ b/src/gallium/drivers/radeon/AMDGPURegisterInfo.td
@ -0,0 +1,22 @@
+//===-- AMDGPURegisterInfo.td - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+let Namespace = "AMDIL" in {
+  def sel_x : SubRegIndex;
+  def sel_y : SubRegIndex;
+  def sel_z : SubRegIndex;
+  def sel_w : SubRegIndex;
+}
+
+include "R600RegisterInfo.td"
+include "SIRegisterInfo.td"
--- a/src/gallium/drivers/radeon/AMDGPUReorderPreloadInstructions.cpp
+++ b/src/gallium/drivers/radeon/AMDGPUReorderPreloadInstructions.cpp
@ -0,0 +1,66 @@
+//===-- AMDGPUReorderPreloadInstructions.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDIL.h"
+#include "AMDILInstrInfo.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Function.h"
+
+using namespace llvm;
+
+namespace {
+  class AMDGPUReorderPreloadInstructionsPass : public MachineFunctionPass {
+
+  private:
+    static char ID;
+    TargetMachine &TM;
+
+  public:
+    AMDGPUReorderPreloadInstructionsPass(TargetMachine &tm) :
+      MachineFunctionPass(ID), TM(tm) { }
+
+      bool runOnMachineFunction(MachineFunction &MF);
+
+      const char *getPassName() const { return "AMDGPU Reorder Preload Instructions"; }
+    };
+} /* End anonymous namespace */
+
+char AMDGPUReorderPreloadInstructionsPass::ID = 0;
+
+FunctionPass *llvm::createAMDGPUReorderPreloadInstructionsPass(TargetMachine &tm) {
+    return new AMDGPUReorderPreloadInstructionsPass(tm);
+}
+
+/* This pass moves instructions that represent preloaded registers to the
+ * start of the program. */
+bool AMDGPUReorderPreloadInstructionsPass::runOnMachineFunction(MachineFunction &MF)
+{
+  const AMDGPUInstrInfo * TII =
+                        static_cast<const AMDGPUInstrInfo*>(TM.getInstrInfo());
+
+  for (MachineFunction::iterator BB = MF.begin(), BB_E = MF.end();
+                                                  BB != BB_E; ++BB) {
+    MachineBasicBlock &MBB = *BB;
+    for (MachineBasicBlock::iterator I = MBB.begin(), Next = llvm::next(I);
+         I != MBB.end(); I = Next, Next = llvm::next(I) ) {
+      MachineInstr &MI = *I;
+      if (TII->isRegPreload(MI)) {
+         MF.front().insert(MF.front().begin(), MI.removeFromParent());
+      }
+    }
+  }
+  return false;
+}
--- a/src/gallium/drivers/radeon/AMDGPUTargetMachine.cpp
+++ b/src/gallium/drivers/radeon/AMDGPUTargetMachine.cpp
@ -0,0 +1,180 @@
+//===-- AMDGPUTargetMachine.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUTargetMachine.h"
+#include "AMDGPU.h"
+#include "AMDILGlobalManager.h"
+#include "AMDILKernelManager.h"
+#include "AMDILTargetMachine.h"
+#include "R600ISelLowering.h"
+#include "R600InstrInfo.h"
+#include "R600KernelParameters.h"
+#include "SIISelLowering.h"
+#include "SIInstrInfo.h"
+#include "llvm/Analysis/Passes.h"
+#include "llvm/Analysis/Verifier.h"
+#include "llvm/CodeGen/MachineFunctionAnalysis.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/MC/MCAsmInfo.h"
+#include "llvm/PassManager.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Support/raw_os_ostream.h"
+#include "llvm/Transforms/IPO.h"
+#include "llvm/Transforms/Scalar.h"
+
+using namespace llvm;
+
+AMDGPUTargetMachine::AMDGPUTargetMachine(const Target &T, StringRef TT,
+    StringRef CPU, StringRef FS,
+  TargetOptions Options,
+  Reloc::Model RM, CodeModel::Model CM,
+  CodeGenOpt::Level OptLevel
+)
+:
+  AMDILTargetMachine(T, TT, CPU, FS, Options, RM, CM, OptLevel),
+  Subtarget(TT, CPU, FS),
+  mGM(new AMDILGlobalManager(0 /* Debug mode */)),
+  mKM(new AMDILKernelManager(this, mGM)),
+  mDump(false)
+
+{
+  /* XXX: Add these two initializations to fix a segfault, not sure if this
+   * is correct.  These are normally initialized in the AsmPrinter, but AMDGPU
+   * does not use the asm printer */
+  Subtarget.setGlobalManager(mGM);
+  Subtarget.setKernelManager(mKM);
+  /* TLInfo uses InstrInfo so it must be initialized after. */
+  if (Subtarget.device()->getGeneration() <= AMDILDeviceInfo::HD6XXX) {
+    InstrInfo = new R600InstrInfo(*this);
+    TLInfo = new R600TargetLowering(*this);
+  } else {
+    InstrInfo = new SIInstrInfo(*this);
+    TLInfo = new SITargetLowering(*this);
+  }
+}
+
+AMDGPUTargetMachine::~AMDGPUTargetMachine()
+{
+    delete mGM;
+    delete mKM;
+}
+
+bool AMDGPUTargetMachine::addPassesToEmitFile(PassManagerBase &PM,
+                                              formatted_raw_ostream &Out,
+                                              CodeGenFileType FileType,
+                                              bool DisableVerify) {
+  /* XXX: Hack here addPassesToEmitFile will fail, but this is Ok since we are
+   * only using it to access addPassesToGenerateCode() */
+  bool fail = LLVMTargetMachine::addPassesToEmitFile(PM, Out, FileType,
+                                                     DisableVerify);
+  assert(fail);
+
+  const AMDILSubtarget &STM = getSubtarget<AMDILSubtarget>();
+  std::string gpu = STM.getDeviceName();
+  if (gpu == "SI") {
+    PM.add(createSICodeEmitterPass(Out));
+  } else if (Subtarget.device()->getGeneration() <= AMDILDeviceInfo::HD6XXX) {
+    PM.add(createR600CodeEmitterPass(Out));
+  } else {
+    abort();
+    return true;
+  }
+  PM.add(createGCInfoDeleter());
+
+  return false;
+}
+
+namespace {
+class AMDGPUPassConfig : public TargetPassConfig {
+public:
+  AMDGPUPassConfig(AMDGPUTargetMachine *TM, PassManagerBase &PM)
+    : TargetPassConfig(TM, PM) {}
+
+  AMDGPUTargetMachine &getAMDGPUTargetMachine() const {
+    return getTM<AMDGPUTargetMachine>();
+  }
+
+  virtual bool addPreISel();
+  virtual bool addInstSelector();
+  virtual bool addPreRegAlloc();
+  virtual bool addPostRegAlloc();
+  virtual bool addPreSched2();
+  virtual bool addPreEmitPass();
+};
+} // End of anonymous namespace
+
+TargetPassConfig *AMDGPUTargetMachine::createPassConfig(PassManagerBase &PM) {
+  return new AMDGPUPassConfig(this, PM);
+}
+
+bool
+AMDGPUPassConfig::addPreISel()
+{
+  const AMDILSubtarget &ST = TM->getSubtarget<AMDILSubtarget>();
+  if (ST.device()->getGeneration() <= AMDILDeviceInfo::HD6XXX) {
+    PM.add(createR600KernelParametersPass(
+                     getAMDGPUTargetMachine().getTargetData()));
+  }
+  return false;
+}
+
+bool AMDGPUPassConfig::addInstSelector() {
+  PM.add(createAMDILBarrierDetect(*TM));
+  PM.add(createAMDILPrintfConvert(*TM));
+  PM.add(createAMDILInlinePass(*TM));
+  PM.add(createAMDILPeepholeOpt(*TM));
+  PM.add(createAMDILISelDag(getAMDGPUTargetMachine()));
+  return false;
+}
+
+bool AMDGPUPassConfig::addPreRegAlloc() {
+  const AMDILSubtarget &ST = TM->getSubtarget<AMDILSubtarget>();
+
+  if (ST.device()->getGeneration() == AMDILDeviceInfo::HD7XXX) {
+    PM.add(createSIInitMachineFunctionInfoPass(*TM));
+  }
+
+  PM.add(createAMDGPUReorderPreloadInstructionsPass(*TM));
+  if (ST.device()->getGeneration() <= AMDILDeviceInfo::HD6XXX) {
+    PM.add(createR600LowerShaderInstructionsPass(*TM));
+    PM.add(createR600LowerInstructionsPass(*TM));
+  } else {
+    PM.add(createSILowerShaderInstructionsPass(*TM));
+    PM.add(createSIAssignInterpRegsPass(*TM));
+    PM.add(createSIConvertToISAPass(*TM));
+  }
+  PM.add(createAMDGPUConvertToISAPass(*TM));
+  return false;
+}
+
+bool AMDGPUPassConfig::addPostRegAlloc() {
+  return false;
+}
+
+bool AMDGPUPassConfig::addPreSched2() {
+  return false;
+}
+
+bool AMDGPUPassConfig::addPreEmitPass() {
+  const AMDILSubtarget &ST = TM->getSubtarget<AMDILSubtarget>();
+  PM.add(createAMDILCFGPreparationPass(*TM));
+  PM.add(createAMDILCFGStructurizerPass(*TM));
+  if (ST.device()->getGeneration() == AMDILDeviceInfo::HD7XXX) {
+    PM.add(createSIPropagateImmReadsPass(*TM));
+  }
+
+  PM.add(createAMDILIOExpansion(*TM));
+  return false;
+}
+
--- a/src/gallium/drivers/radeon/AMDGPUTargetMachine.h
+++ b/src/gallium/drivers/radeon/AMDGPUTargetMachine.h
@ -0,0 +1,62 @@
+//===-- AMDGPUTargetMachine.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDGPU_TARGET_MACHINE_H
+#define AMDGPU_TARGET_MACHINE_H
+
+#include "AMDGPUInstrInfo.h"
+#include "AMDILTargetMachine.h"
+#include "R600ISelLowering.h"
+#include "llvm/ADT/OwningPtr.h"
+#include "llvm/Target/TargetData.h"
+
+namespace llvm {
+
+MCAsmInfo* createMCAsmInfo(const Target &T, StringRef TT);
+
+class AMDGPUTargetMachine : public AMDILTargetMachine {
+  AMDILSubtarget Subtarget;
+     const AMDGPUInstrInfo * InstrInfo;
+     AMDGPUTargetLowering * TLInfo;
+     AMDILGlobalManager *mGM;
+     AMDILKernelManager *mKM;
+     bool mDump;
+
+public:
+   AMDGPUTargetMachine(const Target &T, StringRef TT, StringRef FS,
+                       StringRef CPU,
+                       TargetOptions Options,
+                       Reloc::Model RM, CodeModel::Model CM,
+                       CodeGenOpt::Level OL);
+   ~AMDGPUTargetMachine();
+   virtual const AMDGPUInstrInfo *getInstrInfo() const {return InstrInfo;}
+   virtual const AMDILSubtarget *getSubtargetImpl() const {return &Subtarget; }
+   virtual const AMDGPURegisterInfo *getRegisterInfo() const {
+      return &InstrInfo->getRegisterInfo();
+   }
+   virtual AMDGPUTargetLowering * getTargetLowering() const {
+      return TLInfo;
+   }
+   virtual TargetPassConfig *createPassConfig(PassManagerBase &PM);
+   virtual bool addPassesToEmitFile(PassManagerBase &PM,
+                                              formatted_raw_ostream &Out,
+                                              CodeGenFileType FileType,
+                                              bool DisableVerify);
+public:
+   void dumpCode() { mDump = true; }
+   bool shouldDumpCode() const { return mDump; }
+};
+
+} /* End namespace llvm */
+
+#endif /* AMDGPU_TARGET_MACHINE_H */
--- a/src/gallium/drivers/radeon/AMDGPUUtil.cpp
+++ b/src/gallium/drivers/radeon/AMDGPUUtil.cpp
@ -0,0 +1,127 @@
+//===-- AMDGPUUtil.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUUtil.h"
+#include "AMDGPURegisterInfo.h"
+#include "AMDIL.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Target/TargetInstrInfo.h"
+#include "llvm/Target/TargetMachine.h"
+#include "llvm/Target/TargetRegisterInfo.h"
+
+using namespace llvm;
+
+/* Some instructions act as place holders to emulate operations that the GPU
+ * hardware does automatically. This function can be used to check if
+ * an opcode falls into this category. */
+bool llvm::isPlaceHolderOpcode(unsigned opcode)
+{
+  switch (opcode) {
+  default: return false;
+  case AMDIL::EXPORT_REG:
+  case AMDIL::RETURN:
+  case AMDIL::LOAD_INPUT:
+  case AMDIL::LAST:
+  case AMDIL::RESERVE_REG:
+    return true;
+  }
+}
+
+bool llvm::isTransOp(unsigned opcode)
+{
+  switch(opcode) {
+    default: return false;
+
+    case AMDIL::COS_f32:
+    case AMDIL::COS_r600:
+    case AMDIL::COS_eg:
+    case AMDIL::RSQ_f32:
+    case AMDIL::FTOI:
+    case AMDIL::ITOF:
+    case AMDIL::MULLIT:
+    case AMDIL::MUL_LIT_r600:
+    case AMDIL::MUL_LIT_eg:
+    case AMDIL::SHR_i32:
+    case AMDIL::SIN_f32:
+    case AMDIL::EXP_f32:
+    case AMDIL::EXP_IEEE_r600:
+    case AMDIL::EXP_IEEE_eg:
+    case AMDIL::LOG_CLAMPED_r600:
+    case AMDIL::LOG_IEEE_r600:
+    case AMDIL::LOG_CLAMPED_eg:
+    case AMDIL::LOG_IEEE_eg:
+    case AMDIL::LOG_f32:
+      return true;
+  }
+}
+
+bool llvm::isTexOp(unsigned opcode)
+{
+  switch(opcode) {
+  default: return false;
+  case AMDIL::TEX_SAMPLE:
+  case AMDIL::TEX_SAMPLE_C:
+  case AMDIL::TEX_SAMPLE_L:
+  case AMDIL::TEX_SAMPLE_C_L:
+  case AMDIL::TEX_SAMPLE_LB:
+  case AMDIL::TEX_SAMPLE_C_LB:
+  case AMDIL::TEX_SAMPLE_G:
+  case AMDIL::TEX_SAMPLE_C_G:
+    return true;
+  }
+}
+
+bool llvm::isReductionOp(unsigned opcode)
+{
+  switch(opcode) {
+    default: return false;
+    case AMDIL::DOT4_r600:
+    case AMDIL::DOT4_eg:
+      return true;
+  }
+}
+
+bool llvm::isFCOp(unsigned opcode)
+{
+  switch(opcode) {
+  default: return false;
+  case AMDIL::BREAK_LOGICALZ_f32:
+  case AMDIL::BREAK_LOGICALNZ_i32:
+  case AMDIL::BREAK_LOGICALZ_i32:
+  case AMDIL::CONTINUE_LOGICALNZ_f32:
+  case AMDIL::IF_LOGICALNZ_i32:
+  case AMDIL::IF_LOGICALZ_f32:
+	case AMDIL::ELSE:
+  case AMDIL::ENDIF:
+  case AMDIL::ENDLOOP:
+  case AMDIL::IF_LOGICALNZ_f32:
+  case AMDIL::WHILELOOP:
+    return true;
+  }
+}
+
+void AMDGPU::utilAddLiveIn(MachineFunction * MF, MachineRegisterInfo & MRI,
+    const struct TargetInstrInfo * TII, unsigned physReg, unsigned virtReg)
+{
+    if (!MRI.isLiveIn(physReg)) {
+      MRI.addLiveIn(physReg, virtReg);
+      BuildMI(MF->front(), MF->front().begin(), DebugLoc(),
+                           TII->get(TargetOpcode::COPY), virtReg)
+            .addReg(physReg);
+    } else {
+      MRI.replaceRegWith(virtReg, MRI.getLiveInVirtReg(physReg));
+    }
+}
--- a/src/gallium/drivers/radeon/AMDGPUUtil.h
+++ b/src/gallium/drivers/radeon/AMDGPUUtil.h
@ -0,0 +1,49 @@
+//===-- AMDGPUUtil.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// TODO: Add full description
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDGPU_UTIL_H
+#define AMDGPU_UTIL_H
+
+#include "AMDGPURegisterInfo.h"
+#include "llvm/Support/DataTypes.h"
+
+namespace llvm {
+
+class AMDILMachineFunctionInfo;
+
+class TargetMachine;
+class TargetRegisterInfo;
+
+bool isPlaceHolderOpcode(unsigned opcode);
+
+bool isTransOp(unsigned opcode);
+bool isTexOp(unsigned opcode);
+bool isReductionOp(unsigned opcode);
+bool isFCOp(unsigned opcode);
+
+/* XXX: Move these to AMDGPUInstrInfo.h */
+#define MO_FLAG_CLAMP (1 << 0)
+#define MO_FLAG_NEG   (1 << 1)
+#define MO_FLAG_ABS   (1 << 2)
+#define MO_FLAG_MASK  (1 << 3)
+
+} /* End namespace llvm */
+
+namespace AMDGPU {
+
+void utilAddLiveIn(llvm::MachineFunction * MF, llvm::MachineRegisterInfo & MRI,
+    const struct llvm::TargetInstrInfo * TII, unsigned physReg, unsigned virtReg);
+
+} // End namespace AMDGPU
+
+#endif /* AMDGPU_UTIL_H */
--- a/src/gallium/drivers/radeon/AMDIL.h
+++ b/src/gallium/drivers/radeon/AMDIL.h
@ -0,0 +1,292 @@
+//===-- AMDIL.h - Top-level interface for AMDIL representation --*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file contains the entry points for global functions defined in the LLVM
+// AMDIL back-end.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDIL_H_
+#define AMDIL_H_
+
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/Target/TargetMachine.h"
+
+#define AMDIL_MAJOR_VERSION 2
+#define AMDIL_MINOR_VERSION 0
+#define AMDIL_REVISION_NUMBER 74
+#define ARENA_SEGMENT_RESERVED_UAVS 12
+#define DEFAULT_ARENA_UAV_ID 8
+#define DEFAULT_RAW_UAV_ID 7
+#define GLOBAL_RETURN_RAW_UAV_ID 11
+#define HW_MAX_NUM_CB 8
+#define MAX_NUM_UNIQUE_UAVS 8
+#define OPENCL_MAX_NUM_ATOMIC_COUNTERS 8
+#define OPENCL_MAX_READ_IMAGES 128
+#define OPENCL_MAX_WRITE_IMAGES 8
+#define OPENCL_MAX_SAMPLERS 16
+
+// The next two values can never be zero, as zero is the ID that is
+// used to assert against.
+#define DEFAULT_LDS_ID     1
+#define DEFAULT_GDS_ID     1
+#define DEFAULT_SCRATCH_ID 1
+#define DEFAULT_VEC_SLOTS  8
+
+// SC->CAL version matchings.
+#define CAL_VERSION_SC_150               1700
+#define CAL_VERSION_SC_149               1700
+#define CAL_VERSION_SC_148               1525
+#define CAL_VERSION_SC_147               1525
+#define CAL_VERSION_SC_146               1525
+#define CAL_VERSION_SC_145               1451
+#define CAL_VERSION_SC_144               1451
+#define CAL_VERSION_SC_143               1441
+#define CAL_VERSION_SC_142               1441
+#define CAL_VERSION_SC_141               1420
+#define CAL_VERSION_SC_140               1400
+#define CAL_VERSION_SC_139               1387
+#define CAL_VERSION_SC_138               1387
+#define CAL_APPEND_BUFFER_SUPPORT        1340
+#define CAL_VERSION_SC_137               1331
+#define CAL_VERSION_SC_136                982
+#define CAL_VERSION_SC_135                950
+#define CAL_VERSION_GLOBAL_RETURN_BUFFER  990
+
+#define OCL_DEVICE_RV710        0x0001
+#define OCL_DEVICE_RV730        0x0002
+#define OCL_DEVICE_RV770        0x0004
+#define OCL_DEVICE_CEDAR        0x0008
+#define OCL_DEVICE_REDWOOD      0x0010
+#define OCL_DEVICE_JUNIPER      0x0020
+#define OCL_DEVICE_CYPRESS      0x0040
+#define OCL_DEVICE_CAICOS       0x0080
+#define OCL_DEVICE_TURKS        0x0100
+#define OCL_DEVICE_BARTS        0x0200
+#define OCL_DEVICE_CAYMAN       0x0400
+#define OCL_DEVICE_ALL          0x3FFF
+
+/// The number of function ID's that are reserved for 
+/// internal compiler usage.
+const unsigned int RESERVED_FUNCS = 1024;
+
+#define AMDIL_OPT_LEVEL_DECL
+#define  AMDIL_OPT_LEVEL_VAR
+#define AMDIL_OPT_LEVEL_VAR_NO_COMMA
+
+namespace llvm {
+class AMDILInstrPrinter;
+class AMDILTargetMachine;
+class FunctionPass;
+class MCAsmInfo;
+class raw_ostream;
+class Target;
+class TargetMachine;
+
+/// Instruction selection passes.
+FunctionPass*
+  createAMDILISelDag(AMDILTargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILBarrierDetect(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILPrintfConvert(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILInlinePass(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILPeepholeOpt(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+
+/// Pre regalloc passes.
+FunctionPass*
+  createAMDILPointerManager(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILMachinePeephole(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+
+/// Pre emit passes.
+FunctionPass*
+  createAMDILCFGPreparationPass(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILCFGStructurizerPass(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILLiteralManager(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+FunctionPass*
+  createAMDILIOExpansion(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+
+extern Target TheAMDILTarget;
+extern Target TheAMDGPUTarget;
+} // end namespace llvm;
+
+#define GET_REGINFO_ENUM
+#include "AMDILGenRegisterInfo.inc"
+#define GET_INSTRINFO_ENUM
+#include "AMDILGenInstrInfo.inc"
+
+/// Include device information enumerations
+#include "AMDILDeviceInfo.h"
+
+namespace llvm {
+/// OpenCL uses address spaces to differentiate between
+/// various memory regions on the hardware. On the CPU
+/// all of the address spaces point to the same memory,
+/// however on the GPU, each address space points to
+/// a seperate piece of memory that is unique from other
+/// memory locations.
+namespace AMDILAS {
+enum AddressSpaces {
+  PRIVATE_ADDRESS  = 0, // Address space for private memory.
+  GLOBAL_ADDRESS   = 1, // Address space for global memory (RAT0, VTX0).
+  CONSTANT_ADDRESS = 2, // Address space for constant memory.
+  LOCAL_ADDRESS    = 3, // Address space for local memory.
+  REGION_ADDRESS   = 4, // Address space for region memory.
+  ADDRESS_NONE     = 5, // Address space for unknown memory.
+  PARAM_D_ADDRESS  = 6, // Address space for direct addressible parameter memory (CONST0)
+  PARAM_I_ADDRESS  = 7, // Address space for indirect addressible parameter memory (VTX1)
+  LAST_ADDRESS     = 8
+};
+
+// We are piggybacking on the CommentFlag enum in MachineInstr.h to
+// set bits in AsmPrinterFlags of the MachineInstruction. We will
+// start at bit 16 and allocate down while LLVM will start at bit
+// 1 and allocate up.
+
+// This union/struct combination is an easy way to read out the
+// exact bits that are needed.
+typedef union ResourceRec {
+  struct {
+#ifdef __BIG_ENDIAN__
+    unsigned short isImage       : 1;  // Reserved for future use/llvm.
+    unsigned short ResourceID    : 10; // Flag to specify the resourece ID for
+                                       // the op.
+    unsigned short HardwareInst  : 1;  // Flag to specify that this instruction
+                                       // is a hardware instruction.
+    unsigned short ConflictPtr   : 1;  // Flag to specify that the pointer has a
+                                       // conflict.
+    unsigned short ByteStore     : 1;  // Flag to specify if the op is a byte
+                                       // store op.
+    unsigned short PointerPath   : 1;  // Flag to specify if the op is on the
+                                       // pointer path.
+    unsigned short CacheableRead : 1;  // Flag to specify if the read is
+                                       // cacheable.
+#else
+    unsigned short CacheableRead : 1;  // Flag to specify if the read is
+                                       // cacheable.
+    unsigned short PointerPath   : 1;  // Flag to specify if the op is on the
+                                       // pointer path.
+    unsigned short ByteStore     : 1;  // Flag to specify if the op is byte
+                                       // store op.
+    unsigned short ConflictPtr   : 1;  // Flag to specify that the pointer has
+                                       // a conflict.
+    unsigned short HardwareInst  : 1;  // Flag to specify that this instruction
+                                       // is a hardware instruction.
+    unsigned short ResourceID    : 10; // Flag to specify the resource ID for
+                                       // the op.
+    unsigned short isImage       : 1;  // Reserved for future use.
+#endif
+  } bits;
+  unsigned short u16all;
+} InstrResEnc;
+
+} // namespace AMDILAS
+
+// The OpSwizzle encodes a subset of all possible
+// swizzle combinations into a number of bits using
+// only the combinations utilized by the backend.
+// The lower 128 are for source swizzles and the
+// upper 128 or for destination swizzles.
+// The valid mappings can be found in the
+// getSrcSwizzle and getDstSwizzle functions of
+// AMDILUtilityFunctions.cpp.
+typedef union SwizzleRec {
+  struct {
+#ifdef __BIG_ENDIAN__
+    unsigned char dst : 1;
+    unsigned char swizzle : 7;
+#else
+    unsigned char swizzle : 7;
+    unsigned char dst : 1;
+#endif
+  } bits;
+  unsigned char u8all;
+} OpSwizzle;
+// Enums corresponding to AMDIL condition codes for IL.  These
+// values must be kept in sync with the ones in the .td file.
+namespace AMDILCC {
+enum CondCodes {
+  // AMDIL specific condition codes. These correspond to the IL_CC_*
+  // in AMDILInstrInfo.td and must be kept in the same order.
+  IL_CC_D_EQ  =  0,   // DEQ instruction.
+  IL_CC_D_GE  =  1,   // DGE instruction.
+  IL_CC_D_LT  =  2,   // DLT instruction.
+  IL_CC_D_NE  =  3,   // DNE instruction.
+  IL_CC_F_EQ  =  4,   //  EQ instruction.
+  IL_CC_F_GE  =  5,   //  GE instruction.
+  IL_CC_F_LT  =  6,   //  LT instruction.
+  IL_CC_F_NE  =  7,   //  NE instruction.
+  IL_CC_I_EQ  =  8,   // IEQ instruction.
+  IL_CC_I_GE  =  9,   // IGE instruction.
+  IL_CC_I_LT  = 10,   // ILT instruction.
+  IL_CC_I_NE  = 11,   // INE instruction.
+  IL_CC_U_GE  = 12,   // UGE instruction.
+  IL_CC_U_LT  = 13,   // ULE instruction.
+  // Pseudo IL Comparison instructions here.
+  IL_CC_F_GT  = 14,   //  GT instruction.
+  IL_CC_U_GT  = 15,
+  IL_CC_I_GT  = 16,
+  IL_CC_D_GT  = 17,
+  IL_CC_F_LE  = 18,   //  LE instruction
+  IL_CC_U_LE  = 19,
+  IL_CC_I_LE  = 20,
+  IL_CC_D_LE  = 21,
+  IL_CC_F_UNE = 22,
+  IL_CC_F_UEQ = 23,
+  IL_CC_F_ULT = 24,
+  IL_CC_F_UGT = 25,
+  IL_CC_F_ULE = 26,
+  IL_CC_F_UGE = 27,
+  IL_CC_F_ONE = 28,
+  IL_CC_F_OEQ = 29,
+  IL_CC_F_OLT = 30,
+  IL_CC_F_OGT = 31,
+  IL_CC_F_OLE = 32,
+  IL_CC_F_OGE = 33,
+  IL_CC_D_UNE = 34,
+  IL_CC_D_UEQ = 35,
+  IL_CC_D_ULT = 36,
+  IL_CC_D_UGT = 37,
+  IL_CC_D_ULE = 38,
+  IL_CC_D_UGE = 39,
+  IL_CC_D_ONE = 40,
+  IL_CC_D_OEQ = 41,
+  IL_CC_D_OLT = 42,
+  IL_CC_D_OGT = 43,
+  IL_CC_D_OLE = 44,
+  IL_CC_D_OGE = 45,
+  IL_CC_U_EQ  = 46,
+  IL_CC_U_NE  = 47,
+  IL_CC_F_O   = 48,
+  IL_CC_D_O   = 49,
+  IL_CC_F_UO  = 50,
+  IL_CC_D_UO  = 51,
+  IL_CC_L_LE  = 52,
+  IL_CC_L_GE  = 53,
+  IL_CC_L_EQ  = 54,
+  IL_CC_L_NE  = 55,
+  IL_CC_L_LT  = 56,
+  IL_CC_L_GT  = 57,
+  IL_CC_UL_LE = 58,
+  IL_CC_UL_GE = 59,
+  IL_CC_UL_EQ = 60,
+  IL_CC_UL_NE = 61,
+  IL_CC_UL_LT = 62,
+  IL_CC_UL_GT = 63,
+  COND_ERROR  = 64
+};
+
+} // end namespace AMDILCC
+} // end namespace llvm
+#endif // AMDIL_H_
--- a/src/gallium/drivers/radeon/AMDIL.td
+++ b/src/gallium/drivers/radeon/AMDIL.td
@ -0,0 +1,19 @@
+//===-- AMDIL.td - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// This file specifies where the base TD file exists
+// and where the version specific TD file exists.
+include "AMDILBase.td"
+include "AMDILVersion.td"
+
+include "R600Schedule.td"
+include "SISchedule.td"
+include "Processors.td"
+include "AMDGPUIntrinsics.td"
+include "AMDGPURegisterInfo.td"
+include "AMDGPUInstructions.td"
--- a/src/gallium/drivers/radeon/AMDIL789IOExpansion.cpp
+++ b/src/gallium/drivers/radeon/AMDIL789IOExpansion.cpp
@ -0,0 +1,723 @@
+//===-- AMDIL789IOExpansion.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// @file AMDIL789IOExpansion.cpp
+// @details Implementation of the IO expansion class for 789 devices.
+//
+#include "AMDILCompilerErrors.h"
+#include "AMDILCompilerWarnings.h"
+#include "AMDILDevices.h"
+#include "AMDILGlobalManager.h"
+#include "AMDILIOExpansion.h"
+#include "AMDILKernelManager.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILTargetMachine.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/DerivedTypes.h"
+#include "llvm/Support/DebugLoc.h"
+#include "llvm/Value.h"
+
+using namespace llvm;
+AMDIL789IOExpansion::AMDIL789IOExpansion(TargetMachine &tm
+    AMDIL_OPT_LEVEL_DECL) 
+: AMDILIOExpansion(tm  AMDIL_OPT_LEVEL_VAR)
+{
+}
+
+AMDIL789IOExpansion::~AMDIL789IOExpansion() {
+}
+
+const char *AMDIL789IOExpansion::getPassName() const
+{
+  return "AMDIL 789 IO Expansion Pass";
+}
+// This code produces the following pseudo-IL:
+// mov r1007, $src.y000
+// cmov_logical r1007.x___, $flag.yyyy, r1007.xxxx, $src.xxxx
+// mov r1006, $src.z000
+// cmov_logical r1007.x___, $flag.zzzz, r1006.xxxx, r1007.xxxx
+// mov r1006, $src.w000
+// cmov_logical $dst.x___, $flag.wwww, r1006.xxxx, r1007.xxxx
+void
+AMDIL789IOExpansion::emitComponentExtract(MachineInstr *MI, 
+    unsigned flag, unsigned src, unsigned dst, bool before)
+{
+  MachineBasicBlock::iterator I = *MI;
+  DebugLoc DL = MI->getDebugLoc();
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::VEXTRACT_v4i32), AMDIL::R1007)
+    .addReg(src)
+    .addImm(2);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_Y_i32), AMDIL::R1007)
+    .addReg(flag)
+    .addReg(AMDIL::R1007)
+    .addReg(src);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::VEXTRACT_v4i32), AMDIL::R1006)
+    .addReg(src)
+    .addImm(3);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_Z_i32), AMDIL::R1007)
+    .addReg(flag)
+    .addReg(AMDIL::R1006)
+    .addReg(AMDIL::R1007);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::VEXTRACT_v4i32), AMDIL::R1006)
+    .addReg(src)
+    .addImm(4);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_W_i32), dst)
+    .addReg(flag)
+    .addReg(AMDIL::R1006)
+    .addReg(AMDIL::R1007);
+
+}
+// We have a 128 bit load but a 8/16/32bit value, so we need to
+// select the correct component and make sure that the correct
+// bits are selected. For the 8 and 16 bit cases we need to 
+// extract from the component the correct bits and for 32 bits
+// we just need to select the correct component.
+  void
+AMDIL789IOExpansion::emitDataLoadSelect(MachineInstr *MI)
+{
+  MachineBasicBlock::iterator I = *MI;
+  DebugLoc DL = MI->getDebugLoc();
+  emitComponentExtract(MI, AMDIL::R1008, AMDIL::R1011, AMDIL::R1011, false);
+  if (getMemorySize(MI) == 1) {
+    // This produces the following pseudo-IL:
+    // iand r1006.x___, r1010.xxxx, l14.xxxx
+    // mov r1006, r1006.xxxx
+    // iadd r1006, r1006, {0, -1, 2, 3}
+    // ieq r1008, r1006, 0
+    // mov r1011, r1011.xxxx
+    // ishr r1011, r1011, {0, 8, 16, 24}
+    // mov r1007, r1011.y000
+    // cmov_logical r1007.x___, r1008.yyyy, r1007.xxxx, r1011.xxxx
+    // mov r1006, r1011.z000
+    // cmov_logical r1007.x___, r1008.zzzz, r1006.xxxx, r1007.xxxx
+    // mov r1006, r1011.w000
+    // cmov_logical r1011.x___, r1008.wwww, r1006.xxxx, r1007.xxxx
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1006)
+      .addReg(AMDIL::R1010)
+      .addImm(mMFI->addi32Literal(3));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1006)
+      .addReg(AMDIL::R1006);
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1006)
+      .addReg(AMDIL::R1006)
+      .addImm(mMFI->addi128Literal(0xFFFFFFFFULL << 32, 
+            (0xFFFFFFFEULL | (0xFFFFFFFDULL << 32))));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::IEQ_v4i32), AMDIL::R1008)
+      .addReg(AMDIL::R1006)
+      .addImm(mMFI->addi32Literal(0));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1011)
+      .addReg(AMDIL::R1011);
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHRVEC_v4i32), AMDIL::R1011)
+      .addReg(AMDIL::R1011)
+      .addImm(mMFI->addi128Literal(8ULL << 32, 16ULL | (24ULL << 32)));
+    emitComponentExtract(MI, AMDIL::R1008, AMDIL::R1011, AMDIL::R1011, false);
+  } else if (getMemorySize(MI) == 2) {
+    // This produces the following pseudo-IL:
+    // ishr r1007.x___, r1010.xxxx, 1
+    // iand r1008.x___, r1007.xxxx, 1
+    // ishr r1007.x___, r1011.xxxx, 16
+    // cmov_logical r1011.x___, r1008.xxxx, r1007.xxxx, r1011.xxxx
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1007)
+      .addReg(AMDIL::R1010)
+      .addImm(mMFI->addi32Literal(1));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+      .addReg(AMDIL::R1007)
+      .addImm(mMFI->addi32Literal(1));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1007)
+      .addReg(AMDIL::R1011)
+      .addImm(mMFI->addi32Literal(16));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1011)
+      .addReg(AMDIL::R1008)
+      .addReg(AMDIL::R1007)
+      .addReg(AMDIL::R1011);
+  }
+}
+// This function does address calculations modifications to load from a vector
+// register type instead of a dword addressed load.
+  void 
+AMDIL789IOExpansion::emitVectorAddressCalc(MachineInstr *MI, bool is32bit, bool needsSelect)
+{
+  MachineBasicBlock::iterator I = *MI;
+  DebugLoc DL = MI->getDebugLoc();
+  // This produces the following pseudo-IL:
+  // ishr r1007.x___, r1010.xxxx, (is32bit) ? 2 : 3
+  // iand r1008.x___, r1007.xxxx, (is32bit) ? 3 : 1
+  // ishr r1007.x___, r1007.xxxx, (is32bit) ? 2 : 1
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1007)
+    .addReg(AMDIL::R1010)
+    .addImm(mMFI->addi32Literal((is32bit) ? 0x2 : 3));
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+    .addReg(AMDIL::R1007)
+    .addImm(mMFI->addi32Literal((is32bit) ? 3 : 1));
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1007)
+    .addReg(AMDIL::R1007)
+    .addImm(mMFI->addi32Literal((is32bit) ? 2 : 1));
+  if (needsSelect) {
+    // If the component selection is required, the following 
+    // pseudo-IL is produced.
+    // mov r1008, r1008.xxxx
+    // iadd r1008, r1008, (is32bit) ? {0, -1, -2, -3} : {0, 0, -1, -1}
+    // ieq r1008, r1008, 0
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1008)
+      .addReg(AMDIL::R1008);
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1008)
+      .addReg(AMDIL::R1008)
+      .addImm(mMFI->addi128Literal((is32bit) ? 0xFFFFFFFFULL << 32 : 0ULL,  
+            (is32bit) ? 0xFFFFFFFEULL | (0xFFFFFFFDULL << 32) :
+            -1ULL));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::IEQ_v4i32), AMDIL::R1008)
+      .addReg(AMDIL::R1008)
+      .addImm(mMFI->addi32Literal(0));
+  }
+}
+// This function emits a switch statement and writes 32bit/64bit 
+// value to a 128bit vector register type.
+  void
+AMDIL789IOExpansion::emitVectorSwitchWrite(MachineInstr *MI, bool is32bit) 
+{
+  MachineBasicBlock::iterator I = *MI;
+  uint32_t xID = getPointerID(MI);
+  assert(xID && "Found a scratch store that was incorrectly marked as zero ID!\n");
+  // This section generates the following pseudo-IL:
+  // switch r1008.x
+  // default
+  //   mov x1[r1007.x].(is32bit) ? x___ : xy__, r1011.x{y}
+  // break
+  // case 1
+  //   mov x1[r1007.x].(is32bit) ? _y__ : __zw, r1011.x{yxy}
+  // break
+  // if is32bit is true, case 2 and 3 are emitted.
+  // case 2
+  //   mov x1[r1007.x].__z_, r1011.x
+  // break
+  // case 3
+  //   mov x1[r1007.x].___w, r1011.x
+  // break
+  // endswitch
+  DebugLoc DL;
+  BuildMI(*mBB, I, MI->getDebugLoc(), mTII->get(AMDIL::SWITCH))
+    .addReg(AMDIL::R1008);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::DEFAULT));
+  BuildMI(*mBB, I, DL,
+      mTII->get((is32bit) ? AMDIL::SCRATCHSTORE_X : AMDIL::SCRATCHSTORE_XY)
+      , AMDIL::R1007)
+    .addReg(AMDIL::R1011)
+    .addImm(xID);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::BREAK));
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::CASE)).addImm(1);
+  BuildMI(*mBB, I, DL,
+      mTII->get((is32bit) ? AMDIL::SCRATCHSTORE_Y : AMDIL::SCRATCHSTORE_ZW), AMDIL::R1007)
+    .addReg(AMDIL::R1011)
+    .addImm(xID);
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::BREAK));
+  if (is32bit) {
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::CASE)).addImm(2);
+    BuildMI(*mBB, I, DL,
+        mTII->get(AMDIL::SCRATCHSTORE_Z), AMDIL::R1007)
+      .addReg(AMDIL::R1011)
+      .addImm(xID);
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::BREAK));
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::CASE)).addImm(3);
+    BuildMI(*mBB, I, DL,
+        mTII->get(AMDIL::SCRATCHSTORE_W), AMDIL::R1007)
+      .addReg(AMDIL::R1011)
+      .addImm(xID);
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::BREAK));
+  }
+  BuildMI(*mBB, I, DL, mTII->get(AMDIL::ENDSWITCH));
+
+}
+  void
+AMDIL789IOExpansion::expandPrivateLoad(MachineInstr *MI)
+{
+  MachineBasicBlock::iterator I = *MI;
+  bool HWPrivate = mSTM->device()->usesHardware(AMDILDeviceInfo::PrivateMem);
+  if (!HWPrivate || mSTM->device()->isSupported(AMDILDeviceInfo::PrivateUAV)) {
+    return expandGlobalLoad(MI);
+  }
+  if (!mMFI->usesMem(AMDILDevice::SCRATCH_ID)
+      && mKM->isKernel()) {
+    mMFI->addErrorMsg(amd::CompilerErrorMessage[MEMOP_NO_ALLOCATION]);
+  }
+  uint32_t xID = getPointerID(MI);
+  assert(xID && "Found a scratch load that was incorrectly marked as zero ID!\n");
+  if (!xID) {
+    xID = mSTM->device()->getResourceID(AMDILDevice::SCRATCH_ID);
+    mMFI->addErrorMsg(amd::CompilerWarningMessage[RECOVERABLE_ERROR]);
+  }
+  DebugLoc DL;
+  // These instructions go before the current MI.
+  expandLoadStartCode(MI);
+  switch (getMemorySize(MI)) {
+    default:
+      // Since the private register is a 128 bit aligned, we have to align the address
+      // first, since our source address is 32bit aligned and then load the data.
+      // This produces the following pseudo-IL:
+      // ishr r1010.x___, r1010.xxxx, 4
+	    // mov r1011, x1[r1010.x]
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SHR_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(4));
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SCRATCHLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(xID);
+      break;
+    case 1:
+    case 2:
+    case 4:
+      emitVectorAddressCalc(MI, true, true);
+      // This produces the following pseudo-IL:
+      // mov r1011, x1[r1007.x]
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SCRATCHLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1007)
+        .addImm(xID);
+      // These instructions go after the current MI.
+      emitDataLoadSelect(MI);
+     break;
+    case 8:
+      emitVectorAddressCalc(MI, false, true);
+      // This produces the following pseudo-IL:
+      // mov r1011, x1[r1007.x]
+      // mov r1007, r1011.zw00
+      // cmov_logical r1011.xy__, r1008.xxxx, r1011.xy, r1007.zw
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SCRATCHLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1007)
+        .addImm(xID);
+      // These instructions go after the current MI.
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::VEXTRACT_v2i64), AMDIL::R1007)
+        .addReg(AMDIL::R1011)
+        .addImm(2);
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::CMOVLOG_i64), AMDIL::R1011)
+        .addReg(AMDIL::R1008)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1007);
+     break;
+  }
+  expandPackedData(MI);
+  expandExtendLoad(MI);
+  BuildMI(*mBB, I, MI->getDebugLoc(),
+      mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)),
+      MI->getOperand(0).getReg())
+    .addReg(AMDIL::R1011);
+}
+
+
+  void
+AMDIL789IOExpansion::expandConstantLoad(MachineInstr *MI)
+{
+  MachineBasicBlock::iterator I = *MI;
+  if (!isHardwareInst(MI) || MI->memoperands_empty()) {
+    return expandGlobalLoad(MI);
+  }
+  uint32_t cID = getPointerID(MI);
+  if (cID < 2) {
+    return expandGlobalLoad(MI);
+  }
+  if (!mMFI->usesMem(AMDILDevice::CONSTANT_ID)
+      && mKM->isKernel()) {
+    mMFI->addErrorMsg(amd::CompilerErrorMessage[MEMOP_NO_ALLOCATION]);
+  }
+
+  DebugLoc DL;
+  // These instructions go before the current MI.
+  expandLoadStartCode(MI);
+  switch (getMemorySize(MI)) {
+    default:
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SHR_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(4));
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::CBLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(cID);
+      break;
+    case 1:
+    case 2:
+    case 4:
+      emitVectorAddressCalc(MI, true, true);
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::CBLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1007)
+        .addImm(cID);
+      // These instructions go after the current MI.
+      emitDataLoadSelect(MI);
+      break;
+    case 8:
+      emitVectorAddressCalc(MI, false, true);
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::CBLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1007)
+        .addImm(cID);
+      // These instructions go after the current MI.
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::VEXTRACT_v2i64), AMDIL::R1007)
+        .addReg(AMDIL::R1011)
+        .addImm(2);
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::VCREATE_v2i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::CMOVLOG_i64), AMDIL::R1011)
+        .addReg(AMDIL::R1008)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1007);
+      break;
+  }
+  expandPackedData(MI);
+  expandExtendLoad(MI);
+  BuildMI(*mBB, I, MI->getDebugLoc(),
+      mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)),
+      MI->getOperand(0).getReg())
+    .addReg(AMDIL::R1011);
+  MI->getOperand(0).setReg(AMDIL::R1011);
+}
+
+  void
+AMDIL789IOExpansion::expandConstantPoolLoad(MachineInstr *MI)
+{
+  if (!isStaticCPLoad(MI)) {
+    return expandConstantLoad(MI);
+  } else {
+    uint32_t idx = MI->getOperand(1).getIndex();
+    const MachineConstantPool *MCP = MI->getParent()->getParent()
+      ->getConstantPool();
+    const std::vector<MachineConstantPoolEntry> &consts
+      = MCP->getConstants();
+    const Constant *C = consts[idx].Val.ConstVal;
+    emitCPInst(MI, C, mKM, 0, isExtendLoad(MI));
+  }
+}
+
+  void
+AMDIL789IOExpansion::expandPrivateStore(MachineInstr *MI)
+{
+  MachineBasicBlock::iterator I = *MI;
+  bool HWPrivate = mSTM->device()->usesHardware(AMDILDeviceInfo::PrivateMem);
+  if (!HWPrivate || mSTM->device()->isSupported(AMDILDeviceInfo::PrivateUAV)) {
+    return expandGlobalStore(MI);
+  }
+   if (!mMFI->usesMem(AMDILDevice::SCRATCH_ID)
+      && mKM->isKernel()) {
+    mMFI->addErrorMsg(amd::CompilerErrorMessage[MEMOP_NO_ALLOCATION]);
+  }
+  uint32_t xID = getPointerID(MI);
+  assert(xID && "Found a scratch store that was incorrectly marked as zero ID!\n");
+  if (!xID) {
+    xID = mSTM->device()->getResourceID(AMDILDevice::SCRATCH_ID);
+    mMFI->addErrorMsg(amd::CompilerWarningMessage[RECOVERABLE_ERROR]);
+  }
+  DebugLoc DL;
+   // These instructions go before the current MI.
+  expandStoreSetupCode(MI);
+  switch (getMemorySize(MI)) {
+    default:
+      // This section generates the following pseudo-IL:
+      // ishr r1010.x___, r1010.xxxx, 4
+	    // mov x1[r1010.x], r1011
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SHR_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(4));
+      BuildMI(*mBB, I, MI->getDebugLoc(),
+          mTII->get(AMDIL::SCRATCHSTORE), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(xID);
+      break;
+    case 1:
+      emitVectorAddressCalc(MI, true, true);
+      // This section generates the following pseudo-IL:
+      // mov r1002, x1[r1007.x]
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SCRATCHLOAD), AMDIL::R1002)
+        .addReg(AMDIL::R1007)
+        .addImm(xID);
+      emitComponentExtract(MI, AMDIL::R1008, AMDIL::R1002, AMDIL::R1002, true);
+      // This section generates the following pseudo-IL:
+      // iand r1003.x, r1010.x, 3
+      // mov r1003, r1003.xxxx
+      // iadd r1000, r1003, {0, -1, -2, -3}
+      // ieq r1000, r1000, 0
+      // mov r1002, r1002.xxxx
+      // ishr r1002, r1002, {0, 8, 16, 24}
+      // mov r1011, r1011.xxxx
+      // cmov_logical r1002, r1000, r1011, r1002
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1003)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1003)
+        .addReg(AMDIL::R1003);
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1001)
+        .addReg(AMDIL::R1003)
+        .addImm(mMFI->addi128Literal(0xFFFFFFFFULL << 32, 
+              (0xFFFFFFFEULL | (0xFFFFFFFDULL << 32))));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::IEQ_v4i32), AMDIL::R1001)
+        .addReg(AMDIL::R1001)
+        .addImm(mMFI->addi32Literal(0));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1002)
+        .addReg(AMDIL::R1002);
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHRVEC_v4i32), AMDIL::R1002)
+      .addReg(AMDIL::R1002)
+      .addImm(mMFI->addi128Literal(8ULL << 32, 16ULL | (24ULL << 32)));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011);
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_v4i32), AMDIL::R1002)
+        .addReg(AMDIL::R1001)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1002);
+      if (mSTM->device()->getGeneration() == AMDILDeviceInfo::HD4XXX) {
+        // This section generates the following pseudo-IL:
+        // iand r1002, r1002, 0xFF
+        // ishl r1002, r1002, {0, 8, 16, 24}
+        // ior r1002.xy, r1002.xy, r1002.zw
+        // ior r1011.x, r1002.x, r1002.y
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_v4i32), AMDIL::R1002)
+          .addReg(AMDIL::R1002)
+          .addImm(mMFI->addi32Literal(0xFF));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHL_v4i32), AMDIL::R1002)
+          .addReg(AMDIL::R1002)
+          .addImm(mMFI->addi128Literal(8ULL << 32, 16ULL | (24ULL << 32)));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v2i64), AMDIL::R1002)
+          .addReg(AMDIL::R1002).addReg(AMDIL::R1002);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1002).addReg(AMDIL::R1002);
+      } else {
+        // This section generates the following pseudo-IL:
+        // mov r1001.xy, r1002.yw
+        // mov r1002.xy, r1002.xz
+        // ubit_insert r1002.xy, 8, 8, r1001.xy, r1002.xy
+        // mov r1001.x, r1002.y
+        // ubit_insert r1011.x, 16, 16, r1002.y, r1002.x
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::LHI_v2i64), AMDIL::R1001)
+          .addReg(AMDIL::R1002);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::LLO_v2i64), AMDIL::R1002)
+          .addReg(AMDIL::R1002);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::UBIT_INSERT_v2i32), AMDIL::R1002)
+          .addImm(mMFI->addi32Literal(8))
+          .addImm(mMFI->addi32Literal(8))
+          .addReg(AMDIL::R1001)
+          .addReg(AMDIL::R1002);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::LHI), AMDIL::R1001)
+          .addReg(AMDIL::R1002);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::UBIT_INSERT_i32), AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(16))
+          .addImm(mMFI->addi32Literal(16))
+          .addReg(AMDIL::R1001)
+          .addReg(AMDIL::R1002);
+      }
+      emitVectorAddressCalc(MI, true, false);
+      emitVectorSwitchWrite(MI, true);
+      break;
+    case 2:
+      emitVectorAddressCalc(MI, true, true);
+      // This section generates the following pseudo-IL:
+      // mov r1002, x1[r1007.x]
+      BuildMI(*mBB, I, DL,
+          mTII->get(AMDIL::SCRATCHLOAD), AMDIL::R1002)
+        .addReg(AMDIL::R1007)
+        .addImm(xID);
+      emitComponentExtract(MI, AMDIL::R1008, AMDIL::R1002, AMDIL::R1002, true);
+      // This section generates the following pseudo-IL:
+      // ishr r1003.x, r1010.x, 1
+      // iand r1003.x, r1003.x, 1
+      // ishr r1001.x, r1002.x, 16
+      // cmov_logical r1002.x, r1003.x, r1002.x, r1011.x
+      // cmov_logical r1001.x, r1003.x, r1011.x, r1001.x
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1003)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(1));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1003)
+        .addReg(AMDIL::R1003)
+        .addImm(mMFI->addi32Literal(1));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1001)
+        .addReg(AMDIL::R1002)
+        .addImm(mMFI->addi32Literal(16));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1002)
+        .addReg(AMDIL::R1003)
+        .addReg(AMDIL::R1002)
+        .addReg(AMDIL::R1011);
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1001)
+        .addReg(AMDIL::R1003)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1001);
+      if (mSTM->device()->getGeneration() == AMDILDeviceInfo::HD4XXX) {
+        // This section generates the following pseudo-IL:
+        // iand r1002.x, r1002.x, 0xFFFF
+        // iand r1001.x, r1001.x, 0xFFFF
+        // ishl r1001.x, r1002.x, 16
+        // ior r1011.x, r1002.x, r1001.x
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1002)
+          .addReg(AMDIL::R1002)
+          .addImm(mMFI->addi32Literal(0xFFFF));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1001)
+          .addReg(AMDIL::R1001)
+          .addImm(mMFI->addi32Literal(0xFFFF));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1001)
+          .addReg(AMDIL::R1001)
+          .addImm(mMFI->addi32Literal(16));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_OR_i32), AMDIL::R1011)
+          .addReg(AMDIL::R1002).addReg(AMDIL::R1001);
+      } else {
+        // This section generates the following pseudo-IL:
+        // ubit_insert r1011.x, 16, 16, r1001.y, r1002.x
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::UBIT_INSERT_i32), AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(16))
+          .addImm(mMFI->addi32Literal(16))
+          .addReg(AMDIL::R1001)
+          .addReg(AMDIL::R1002);
+      }
+      emitVectorAddressCalc(MI, true, false);
+      emitVectorSwitchWrite(MI, true);
+      break;
+    case 4:
+      emitVectorAddressCalc(MI, true, false);
+      emitVectorSwitchWrite(MI, true);
+      break;
+    case 8:
+      emitVectorAddressCalc(MI, false, false);
+      emitVectorSwitchWrite(MI, false);
+      break;
+  };
+}
+ void
+AMDIL789IOExpansion::expandStoreSetupCode(MachineInstr *MI)
+{
+  MachineBasicBlock::iterator I = *MI;
+  DebugLoc DL;
+  if (MI->getOperand(0).isUndef()) {
+  BuildMI(*mBB, I, DL, mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)), AMDIL::R1011)
+      .addImm(mMFI->addi32Literal(0));
+  } else {
+  BuildMI(*mBB, I, DL, mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)), AMDIL::R1011)
+      .addReg(MI->getOperand(0).getReg());
+  }
+  expandTruncData(MI);
+  if (MI->getOperand(2).isReg()) {
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::ADD_i32), AMDIL::R1010)
+      .addReg(MI->getOperand(1).getReg())
+      .addReg(MI->getOperand(2).getReg());
+  } else {
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::MOVE_i32), AMDIL::R1010)
+      .addReg(MI->getOperand(1).getReg());
+  }
+  expandAddressCalc(MI);
+  expandPackedData(MI);
+}
+
+
+void
+AMDIL789IOExpansion::expandPackedData(MachineInstr *MI)
+{
+  MachineBasicBlock::iterator I = *MI;
+  if (!isPackedData(MI)) {
+    return;
+  }
+  DebugLoc DL;
+  // If we have packed data, then the shift size is no longer
+  // the same as the load size and we need to adjust accordingly
+  switch(getPackedID(MI)) {
+    default:
+      break;
+    case PACK_V2I8:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi64Literal(0xFFULL | (0xFFULL << 32)));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHL_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addImm(mMFI->addi64Literal(8ULL << 32));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1011);
+      }
+      break;
+    case PACK_V4I8:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_v4i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(0xFF));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHL_v4i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi128Literal(8ULL << 32, (16ULL | (24ULL << 32))));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v2i64), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1011);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1011);
+      }
+      break;
+    case PACK_V2I16:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(0xFFFF));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHL_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi64Literal(16ULL << 32));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v2i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1011);
+      }
+      break;
+    case PACK_V4I16:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::BINARY_AND_v4i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(0xFFFF));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::SHL_v4i32), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi64Literal(16ULL << 32));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::HILO_BITOR_v4i16), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1011);
+      }
+      break;
+    case UNPACK_V2I8:
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::USHRVEC_i32), AMDIL::R1012)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(8));
+      BuildMI(*mBB, I, DL, mTII->get(AMDIL::LCREATE), AMDIL::R1011)
+        .addReg(AMDIL::R1011).addReg(AMDIL::R1012);
+      break;
+    case UNPACK_V4I8:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::VCREATE_v4i8), AMDIL::R1011)
+          .addReg(AMDIL::R1011);
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::USHRVEC_v4i8), AMDIL::R1011)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi128Literal(8ULL << 32, (16ULL | (24ULL << 32))));
+      }
+      break;
+    case UNPACK_V2I16:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::USHRVEC_i32), AMDIL::R1012)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(16));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::LCREATE), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1012);
+      }
+      break;
+    case UNPACK_V4I16:
+      {
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::USHRVEC_v2i32), AMDIL::R1012)
+          .addReg(AMDIL::R1011)
+          .addImm(mMFI->addi32Literal(16));
+        BuildMI(*mBB, I, DL, mTII->get(AMDIL::LCREATE_v2i64), AMDIL::R1011)
+          .addReg(AMDIL::R1011).addReg(AMDIL::R1012);
+      }
+      break;
+  };
+}
--- a/src/gallium/drivers/radeon/AMDIL7XXDevice.cpp
+++ b/src/gallium/drivers/radeon/AMDIL7XXDevice.cpp
@ -0,0 +1,157 @@
+//===-- AMDIL7XXDevice.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDIL7XXDevice.h"
+#ifdef UPSTREAM_LLVM
+#include "AMDIL7XXAsmPrinter.h"
+#endif
+#include "AMDILDevice.h"
+#include "AMDILIOExpansion.h"
+#include "AMDILPointerManager.h"
+
+using namespace llvm;
+
+AMDIL7XXDevice::AMDIL7XXDevice(AMDILSubtarget *ST) : AMDILDevice(ST)
+{
+  setCaps();
+  std::string name = mSTM->getDeviceName();
+  if (name == "rv710") {
+    mDeviceFlag = OCL_DEVICE_RV710;
+  } else if (name == "rv730") {
+    mDeviceFlag = OCL_DEVICE_RV730;
+  } else {
+    mDeviceFlag = OCL_DEVICE_RV770;
+  }
+}
+
+AMDIL7XXDevice::~AMDIL7XXDevice()
+{
+}
+
+void AMDIL7XXDevice::setCaps()
+{
+  mSWBits.set(AMDILDeviceInfo::LocalMem);
+}
+
+size_t AMDIL7XXDevice::getMaxLDSSize() const
+{
+  if (usesHardware(AMDILDeviceInfo::LocalMem)) {
+    return MAX_LDS_SIZE_700;
+  }
+  return 0;
+}
+
+size_t AMDIL7XXDevice::getWavefrontSize() const
+{
+  return AMDILDevice::HalfWavefrontSize;
+}
+
+uint32_t AMDIL7XXDevice::getGeneration() const
+{
+  return AMDILDeviceInfo::HD4XXX;
+}
+
+uint32_t AMDIL7XXDevice::getResourceID(uint32_t DeviceID) const
+{
+  switch (DeviceID) {
+  default:
+    assert(0 && "ID type passed in is unknown!");
+    break;
+  case GLOBAL_ID:
+  case CONSTANT_ID:
+  case RAW_UAV_ID:
+  case ARENA_UAV_ID:
+    break;
+  case LDS_ID:
+    if (usesHardware(AMDILDeviceInfo::LocalMem)) {
+      return DEFAULT_LDS_ID;
+    }
+    break;
+  case SCRATCH_ID:
+    if (usesHardware(AMDILDeviceInfo::PrivateMem)) {
+      return DEFAULT_SCRATCH_ID;
+    }
+    break;
+  case GDS_ID:
+    assert(0 && "GDS UAV ID is not supported on this chip");
+    if (usesHardware(AMDILDeviceInfo::RegionMem)) {
+      return DEFAULT_GDS_ID;
+    }
+    break;
+  };
+
+  return 0;
+}
+
+uint32_t AMDIL7XXDevice::getMaxNumUAVs() const
+{
+  return 1;
+}
+
+FunctionPass* 
+AMDIL7XXDevice::getIOExpansion(
+    TargetMachine& TM AMDIL_OPT_LEVEL_DECL) const
+{
+  return new AMDIL7XXIOExpansion(TM  AMDIL_OPT_LEVEL_VAR);
+}
+
+AsmPrinter*
+AMDIL7XXDevice::getAsmPrinter(TargetMachine& TM, MCStreamer &Streamer) const
+{
+#ifdef UPSTREAM_LLVM
+  return new AMDIL7XXAsmPrinter(TM, Streamer);
+#else
+  return NULL;
+#endif
+}
+
+FunctionPass*
+AMDIL7XXDevice::getPointerManager(
+    TargetMachine& TM AMDIL_OPT_LEVEL_DECL) const
+{
+  return new AMDILPointerManager(TM  AMDIL_OPT_LEVEL_VAR);
+}
+
+AMDIL770Device::AMDIL770Device(AMDILSubtarget *ST): AMDIL7XXDevice(ST)
+{
+  setCaps();
+}
+
+AMDIL770Device::~AMDIL770Device()
+{
+}
+
+void AMDIL770Device::setCaps()
+{
+  if (mSTM->isOverride(AMDILDeviceInfo::DoubleOps)) {
+    mSWBits.set(AMDILDeviceInfo::FMA);
+    mHWBits.set(AMDILDeviceInfo::DoubleOps);
+  }
+  mSWBits.set(AMDILDeviceInfo::BarrierDetect);
+  mHWBits.reset(AMDILDeviceInfo::LongOps);
+  mSWBits.set(AMDILDeviceInfo::LongOps);
+  mSWBits.set(AMDILDeviceInfo::LocalMem);
+}
+
+size_t AMDIL770Device::getWavefrontSize() const
+{
+  return AMDILDevice::WavefrontSize;
+}
+
+AMDIL710Device::AMDIL710Device(AMDILSubtarget *ST) : AMDIL7XXDevice(ST)
+{
+}
+
+AMDIL710Device::~AMDIL710Device()
+{
+}
+
+size_t AMDIL710Device::getWavefrontSize() const
+{
+  return AMDILDevice::QuarterWavefrontSize;
+}
--- a/src/gallium/drivers/radeon/AMDIL7XXDevice.h
+++ b/src/gallium/drivers/radeon/AMDIL7XXDevice.h
@ -0,0 +1,77 @@
+//==-- AMDIL7XXDevice.h - Define 7XX Device Device for AMDIL ---*- C++ -*--===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// Interface for the subtarget data classes.
+//
+//===----------------------------------------------------------------------===//
+// This file will define the interface that each generation needs to
+// implement in order to correctly answer queries on the capabilities of the
+// specific hardware.
+//===----------------------------------------------------------------------===//
+#ifndef _AMDIL7XXDEVICEIMPL_H_
+#define _AMDIL7XXDEVICEIMPL_H_
+#include "AMDILDevice.h"
+#include "AMDILSubtarget.h"
+
+namespace llvm {
+class AMDILSubtarget;
+
+//===----------------------------------------------------------------------===//
+// 7XX generation of devices and their respective sub classes
+//===----------------------------------------------------------------------===//
+
+// The AMDIL7XXDevice class represents the generic 7XX device. All 7XX
+// devices are derived from this class. The AMDIL7XX device will only
+// support the minimal features that are required to be considered OpenCL 1.0
+// compliant and nothing more.
+class AMDIL7XXDevice : public AMDILDevice {
+public:
+  AMDIL7XXDevice(AMDILSubtarget *ST);
+  virtual ~AMDIL7XXDevice();
+  virtual size_t getMaxLDSSize() const;
+  virtual size_t getWavefrontSize() const;
+  virtual uint32_t getGeneration() const;
+  virtual uint32_t getResourceID(uint32_t DeviceID) const;
+  virtual uint32_t getMaxNumUAVs() const;
+  FunctionPass*
+    getIOExpansion(TargetMachine& AMDIL_OPT_LEVEL_DECL) const;
+  AsmPrinter* 
+    getAsmPrinter(TargetMachine& TM, MCStreamer &Streamer) const;
+  FunctionPass*
+    getPointerManager(TargetMachine& AMDIL_OPT_LEVEL_DECL) const;
+
+protected:
+  virtual void setCaps();
+}; // AMDIL7XXDevice
+
+// The AMDIL770Device class represents the RV770 chip and it's
+// derivative cards. The difference between this device and the base
+// class is this device device adds support for double precision
+// and has a larger wavefront size.
+class AMDIL770Device : public AMDIL7XXDevice {
+public:
+  AMDIL770Device(AMDILSubtarget *ST);
+  virtual ~AMDIL770Device();
+  virtual size_t getWavefrontSize() const;
+private:
+  virtual void setCaps();
+}; // AMDIL770Device
+
+// The AMDIL710Device class derives from the 7XX base class, but this
+// class is a smaller derivative, so we need to overload some of the
+// functions in order to correctly specify this information.
+class AMDIL710Device : public AMDIL7XXDevice {
+public:
+  AMDIL710Device(AMDILSubtarget *ST);
+  virtual ~AMDIL710Device();
+  virtual size_t getWavefrontSize() const;
+}; // AMDIL710Device
+
+} // namespace llvm
+#endif // _AMDILDEVICEIMPL_H_
--- a/src/gallium/drivers/radeon/AMDIL7XXIOExpansion.cpp
+++ b/src/gallium/drivers/radeon/AMDIL7XXIOExpansion.cpp
@ -0,0 +1,548 @@
+//===-- AMDIL7XXIOExpansion.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// @file AMDIL7XXIOExpansion.cpp
+// @details Implementation of the IO Printing class for 7XX devices
+//
+#include "AMDILCompilerErrors.h"
+#include "AMDILCompilerWarnings.h"
+#include "AMDILDevices.h"
+#include "AMDILGlobalManager.h"
+#include "AMDILIOExpansion.h"
+#include "AMDILKernelManager.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILTargetMachine.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/DerivedTypes.h"
+#include "llvm/Support/DebugLoc.h"
+#include "llvm/Value.h"
+
+using namespace llvm;
+AMDIL7XXIOExpansion::AMDIL7XXIOExpansion(TargetMachine &tm
+    AMDIL_OPT_LEVEL_DECL) : AMDIL789IOExpansion(tm  AMDIL_OPT_LEVEL_VAR)
+{
+}
+
+AMDIL7XXIOExpansion::~AMDIL7XXIOExpansion() {
+}
+const char *AMDIL7XXIOExpansion::getPassName() const
+{
+  return "AMDIL 7XX IO Expansion Pass";
+}
+
+  void
+AMDIL7XXIOExpansion::expandGlobalLoad(MachineInstr *MI)
+{
+  DebugLoc DL;
+  // These instructions go before the current MI.
+  expandLoadStartCode(MI);
+  uint32_t ID = getPointerID(MI);
+  mKM->setOutputInst();
+  switch(getMemorySize(MI)) {
+    default:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWLOAD_v4i32), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(ID);
+      break;
+    case 4:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWLOAD_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(ID);
+      break;
+    case 8:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWLOAD_v2i32), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(ID);
+      break;
+    case 1:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(0xFFFFFFFC));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi128Literal(0xFFFFFFFFULL << 32, 
+                (0xFFFFFFFEULL | (0xFFFFFFFDULL << 32))));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::IEQ_v4i32), AMDIL::R1012)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(0));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1012)
+        .addImm(mMFI->addi32Literal(0))
+        .addImm(mMFI->addi32Literal(24));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_Y_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1012)
+        .addImm(mMFI->addi32Literal(8))
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_Z_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1012)
+        .addImm(mMFI->addi32Literal(16))
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWLOAD_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(ID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i8), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+      break;
+    case 2:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(1));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(0xFFFFFFFC));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(16))
+        .addImm(mMFI->addi32Literal(0));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWLOAD_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(ID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i16), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+      break;
+  }
+  // These instructions go after the current MI.
+  expandPackedData(MI);
+  expandExtendLoad(MI);
+  BuildMI(*mBB, MI, MI->getDebugLoc(),
+      mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)))
+    .addOperand(MI->getOperand(0))
+    .addReg(AMDIL::R1011);
+  MI->getOperand(0).setReg(AMDIL::R1011);
+}
+
+  void
+AMDIL7XXIOExpansion::expandRegionLoad(MachineInstr *MI)
+{
+  bool HWRegion = mSTM->device()->usesHardware(AMDILDeviceInfo::RegionMem);
+  if (!mSTM->device()->isSupported(AMDILDeviceInfo::RegionMem)) {
+    mMFI->addErrorMsg(
+        amd::CompilerErrorMessage[REGION_MEMORY_ERROR]);
+    return;
+  }
+  if (!HWRegion || !isHardwareRegion(MI)) {
+    return expandGlobalLoad(MI);
+  }
+  if (!mMFI->usesMem(AMDILDevice::GDS_ID)
+      && mKM->isKernel()) {
+    mMFI->addErrorMsg(amd::CompilerErrorMessage[MEMOP_NO_ALLOCATION]);
+  }
+  uint32_t gID = getPointerID(MI);
+  assert(gID && "Found a GDS load that was incorrectly marked as zero ID!\n");
+  if (!gID) {
+    gID = mSTM->device()->getResourceID(AMDILDevice::GDS_ID);
+    mMFI->addErrorMsg(amd::CompilerWarningMessage[RECOVERABLE_ERROR]);
+  }
+  
+  DebugLoc DL;
+  // These instructions go before the current MI.
+  expandLoadStartCode(MI);
+   switch (getMemorySize(MI)) {
+    default:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi128Literal(1ULL << 32, 2ULL | (3ULL << 32)));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD_Y), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD_Z), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD_W), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      break;
+    case 1:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UMUL_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(8));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(0xFFFFFFFC));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      // The instruction would normally fit in right here so everything created
+      // after this point needs to go into the afterInst vector.
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(24));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(24));
+      break;
+    case 2:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UMUL_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(8));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(0xFFFFFFFC));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      // The instruction would normally fit in right here so everything created
+      // after this point needs to go into the afterInst vector.
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(16));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(16));
+      break;
+    case 4:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      break;
+    case 8:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::VCREATE_v2i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi64Literal(1ULL << 32));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSLOAD_Y), AMDIL::R1011)
+        .addReg(AMDIL::R1010)
+        .addImm(gID);
+      break;
+   }
+
+  // These instructions go after the current MI.
+  expandPackedData(MI);
+  expandExtendLoad(MI);
+  BuildMI(*mBB, MI, MI->getDebugLoc(),
+      mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)))
+    .addOperand(MI->getOperand(0))
+    .addReg(AMDIL::R1011);
+  MI->getOperand(0).setReg(AMDIL::R1011);
+}
+  void
+AMDIL7XXIOExpansion::expandLocalLoad(MachineInstr *MI)
+{
+  bool HWLocal = mSTM->device()->usesHardware(AMDILDeviceInfo::LocalMem);
+  if (!HWLocal || !isHardwareLocal(MI)) {
+    return expandGlobalLoad(MI);
+  }
+  if (!mMFI->usesMem(AMDILDevice::LDS_ID)
+      && mKM->isKernel()) {
+    mMFI->addErrorMsg(amd::CompilerErrorMessage[MEMOP_NO_ALLOCATION]);
+  }
+  uint32_t lID = getPointerID(MI);
+  assert(lID && "Found a LDS load that was incorrectly marked as zero ID!\n");
+  if (!lID) {
+    lID = mSTM->device()->getResourceID(AMDILDevice::LDS_ID);
+    mMFI->addErrorMsg(amd::CompilerWarningMessage[RECOVERABLE_ERROR]);
+  }
+  DebugLoc DL;
+  // These instructions go before the current MI.
+  expandLoadStartCode(MI);
+  switch (getMemorySize(MI)) {
+    default:
+    case 8:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::LDSLOADVEC), AMDIL::R1011) 
+        .addReg(AMDIL::R1010)
+        .addImm(lID);
+      break;
+    case 4:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::LDSLOAD), AMDIL::R1011) 
+        .addReg(AMDIL::R1010)
+        .addImm(lID);
+      break;
+    case 1:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UMUL_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(8));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(0xFFFFFFFC));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::LDSLOAD), AMDIL::R1011) 
+        .addReg(AMDIL::R1010)
+        .addImm(lID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(24));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(24));
+      break;
+    case 2:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UMUL_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(8));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(0xFFFFFFFC));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::LDSLOAD), AMDIL::R1011) 
+        .addReg(AMDIL::R1010)
+        .addImm(lID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(16));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(16));
+      break;
+   }
+
+  // These instructions go after the current MI.
+  expandPackedData(MI);
+  expandExtendLoad(MI);
+  BuildMI(*mBB, MI, MI->getDebugLoc(),
+      mTII->get(getMoveInstFromID(
+          MI->getDesc().OpInfo[0].RegClass)))
+    .addOperand(MI->getOperand(0))
+    .addReg(AMDIL::R1011);
+  MI->getOperand(0).setReg(AMDIL::R1011);
+}
+
+  void
+AMDIL7XXIOExpansion::expandGlobalStore(MachineInstr *MI)
+{
+  uint32_t ID = getPointerID(MI);
+  mKM->setOutputInst();
+  DebugLoc DL = MI->getDebugLoc();
+  // These instructions go before the current MI.
+  expandStoreSetupCode(MI);
+  switch (getMemorySize(MI)) {
+    default:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWSTORE_v4i32), AMDIL::MEM)
+        .addReg(AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(ID);
+      break;
+    case 1:
+      mMFI->addErrorMsg(
+          amd::CompilerErrorMessage[BYTE_STORE_ERROR]);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWSTORE_i32), AMDIL::MEM)
+        .addReg(AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(ID);
+      break;
+    case 2:
+      mMFI->addErrorMsg(
+          amd::CompilerErrorMessage[BYTE_STORE_ERROR]);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWSTORE_i32), AMDIL::MEM)
+        .addReg(AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(ID);
+      break;
+    case 4:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWSTORE_i32), AMDIL::MEM)
+        .addReg(AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(ID);
+      break;
+    case 8:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UAVRAWSTORE_v2i32), AMDIL::MEM)
+        .addReg(AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(ID);
+      break;
+  };
+}
+
+  void
+AMDIL7XXIOExpansion::expandRegionStore(MachineInstr *MI)
+{
+  bool HWRegion = mSTM->device()->usesHardware(AMDILDeviceInfo::RegionMem);
+  if (!mSTM->device()->isSupported(AMDILDeviceInfo::RegionMem)) {
+    mMFI->addErrorMsg(
+        amd::CompilerErrorMessage[REGION_MEMORY_ERROR]);
+    return;
+  }
+  if (!HWRegion || !isHardwareRegion(MI)) {
+    return expandGlobalStore(MI);
+  }
+  DebugLoc DL = MI->getDebugLoc();
+  mKM->setOutputInst();
+  if (!mMFI->usesMem(AMDILDevice::GDS_ID)
+      && mKM->isKernel()) {
+    mMFI->addErrorMsg(amd::CompilerErrorMessage[MEMOP_NO_ALLOCATION]);
+  }
+  uint32_t gID = getPointerID(MI);
+  assert(gID && "Found a GDS store that was incorrectly marked as zero ID!\n");
+  if (!gID) {
+    gID = mSTM->device()->getResourceID(AMDILDevice::GDS_ID);
+    mMFI->addErrorMsg(amd::CompilerWarningMessage[RECOVERABLE_ERROR]);
+  }
+
+  // These instructions go before the current MI.
+  expandStoreSetupCode(MI);
+  switch (getMemorySize(MI)) {
+    default:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi128Literal(1ULL << 32, 2ULL | (3ULL << 32)));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE_Y), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE_Z), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE_W), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+      break;
+    case 1:
+      mMFI->addErrorMsg(
+          amd::CompilerErrorMessage[BYTE_STORE_ERROR]);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(0xFF));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1012)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::VCREATE_v4i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi128Literal(0xFFFFFFFFULL << 32, 
+              (0xFFFFFFFEULL | (0xFFFFFFFDULL << 32))));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::UMUL_i32), AMDIL::R1006)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(8));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1007)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(0xFFFFFF00))
+        .addImm(mMFI->addi32Literal(0x00FFFFFF));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_Y_i32), AMDIL::R1007)
+        .addReg(AMDIL::R1008)
+        .addReg(AMDIL::R1007)
+        .addImm(mMFI->addi32Literal(0xFF00FFFF));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_Z_i32), AMDIL::R1012)
+        .addReg(AMDIL::R1008)
+        .addReg(AMDIL::R1007)
+        .addImm(mMFI->addi32Literal(0xFFFF00FF));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1007);
+       BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+       break;
+    case 2:
+      mMFI->addErrorMsg(
+          amd::CompilerErrorMessage[BYTE_STORE_ERROR]);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addImm(mMFI->addi32Literal(0x0000FFFF));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::BINARY_AND_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi32Literal(3));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHR_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(1));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1012)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(0x0000FFFF))
+        .addImm(mMFI->addi32Literal(0xFFFF0000));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::CMOVLOG_i32), AMDIL::R1008)
+        .addReg(AMDIL::R1008)
+        .addImm(mMFI->addi32Literal(16))
+        .addImm(mMFI->addi32Literal(0));
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::SHL_i32), AMDIL::R1011)
+        .addReg(AMDIL::R1011)
+        .addReg(AMDIL::R1008);
+       BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+       break;
+    case 4:
+       BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+      break;
+    case 8:
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::VCREATE_v2i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010);
+      BuildMI(*mBB, MI, DL, mTII->get(AMDIL::ADD_v4i32), AMDIL::R1010)
+        .addReg(AMDIL::R1010)
+        .addImm(mMFI->addi64Literal(1ULL << 32));
+       BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+       BuildMI(*mBB, MI, DL, mTII->get(AMDIL::GDSSTORE_Y), AMDIL::R1010)
+        .addReg(AMDIL::R1011)
+        .addImm(gID);
+      break;
+   };
+}
+
+  void
+AMDIL7XXIOExpansion::expandLocalStore(MachineInstr *MI)
+{
+  bool HWLocal = mSTM->device()->usesHardware(AMDILDeviceInfo::LocalMem);
+  if (!HWLocal || !isHardwareLocal(MI)) {
+    return expandGlobalStore(MI);
+  }
+  uint32_t lID = getPointerID(MI);
+  assert(lID && "Found a LDS store that was incorrectly marked as zero ID!\n");
+  if (!lID) {
+    lID = mSTM->device()->getResourceID(AMDILDevice::LDS_ID);
+    mMFI->addErrorMsg(amd::CompilerWarningMessage[RECOVERABLE_ERROR]);
+  }
+  DebugLoc DL = MI->getDebugLoc();
+  // These instructions go before the current MI.
+  expandStoreSetupCode(MI);
+  BuildMI(*mBB, MI, DL, mTII->get(AMDIL::LDSSTOREVEC), AMDIL::MEM)
+    .addReg(AMDIL::R1010)
+    .addReg(AMDIL::R1011)
+    .addImm(lID);
+}
--- a/src/gallium/drivers/radeon/AMDILAlgorithms.tpp
+++ b/src/gallium/drivers/radeon/AMDILAlgorithms.tpp
@ -0,0 +1,93 @@
+//===------ AMDILAlgorithms.tpp - AMDIL Template Algorithms Header --------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file provides templates algorithms that extend the STL algorithms, but
+// are useful for the AMDIL backend
+//
+//===----------------------------------------------------------------------===//
+
+// A template function that loops through the iterators and passes the second
+// argument along with each iterator to the function. If the function returns
+// true, then the current iterator is invalidated and it moves back, before
+// moving forward to the next iterator, otherwise it moves forward without
+// issue. This is based on the for_each STL function, but allows a reference to
+// the second argument
+template<class InputIterator, class Function, typename Arg>
+Function binaryForEach(InputIterator First, InputIterator Last, Function F,
+                       Arg &Second)
+{
+  for ( ; First!=Last; ++First ) {
+    F(*First, Second);
+  }
+  return F;
+}
+
+template<class InputIterator, class Function, typename Arg>
+Function safeBinaryForEach(InputIterator First, InputIterator Last, Function F,
+                           Arg &Second)
+{
+  for ( ; First!=Last; ++First ) {
+    if (F(*First, Second)) {
+      --First;
+    }
+  }
+  return F;
+}
+
+// A template function that has two levels of looping before calling the
+// function with the passed in argument. See binaryForEach for further
+// explanation
+template<class InputIterator, class Function, typename Arg>
+Function binaryNestedForEach(InputIterator First, InputIterator Last,
+                             Function F, Arg &Second)
+{
+  for ( ; First != Last; ++First) {
+    binaryForEach(First->begin(), First->end(), F, Second);
+  }
+  return F;
+}
+template<class InputIterator, class Function, typename Arg>
+Function safeBinaryNestedForEach(InputIterator First, InputIterator Last,
+                                 Function F, Arg &Second)
+{
+  for ( ; First != Last; ++First) {
+    safeBinaryForEach(First->begin(), First->end(), F, Second);
+  }
+  return F;
+}
+
+// Unlike the STL, a pointer to the iterator itself is passed in with the 'safe'
+// versions of these functions This allows the function to handle situations
+// such as invalidated iterators
+template<class InputIterator, class Function>
+Function safeForEach(InputIterator First, InputIterator Last, Function F)
+{
+  for ( ; First!=Last; ++First )  F(&First)
+    ; // Do nothing.
+  return F;
+}
+
+// A template function that has two levels of looping before calling the
+// function with a pointer to the current iterator. See binaryForEach for
+// further explanation
+template<class InputIterator, class SecondIterator, class Function>
+Function safeNestedForEach(InputIterator First, InputIterator Last,
+                              SecondIterator S, Function F)
+{
+  for ( ; First != Last; ++First) {
+    SecondIterator sf, sl;
+    for (sf = First->begin(), sl = First->end();
+         sf != sl; )  {
+      if (!F(&sf)) {
+        ++sf;
+      } 
+    }
+  }
+  return F;
+}
--- a/src/gallium/drivers/radeon/AMDILAsmBackend.cpp
+++ b/src/gallium/drivers/radeon/AMDILAsmBackend.cpp
@ -0,0 +1,82 @@
+//===------ AMDILAsmBackend.cpp - AMDIL Assembly Backend ---===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+//
+#include "AMDILAsmBackend.h"
+#include "llvm/Support/TargetRegistry.h"
+
+using namespace llvm;
+namespace llvm {
+  ASM_BACKEND_CLASS* createAMDILAsmBackend(const ASM_BACKEND_CLASS &T,
+                                          const std::string &TT)
+  {
+    return new AMDILAsmBackend(T);
+  }
+} // namespace llvm
+
+//===--------------------- Default AMDIL Asm Backend ---------------------===//
+AMDILAsmBackend::AMDILAsmBackend(const ASM_BACKEND_CLASS &T) 
+  : ASM_BACKEND_CLASS()
+{
+}
+
+MCObjectWriter *
+AMDILAsmBackend::createObjectWriter(raw_ostream &OS) const
+{
+  return 0;
+}
+
+bool 
+AMDILAsmBackend::doesSectionRequireSymbols(const MCSection &Section) const
+{
+  return false;
+}
+
+bool 
+AMDILAsmBackend::isSectionAtomizable(const MCSection &Section) const
+{
+  return true;
+}
+
+bool 
+AMDILAsmBackend::isVirtualSection(const MCSection &Section) const
+{
+  return false;
+  //const MCSectionELF &SE = static_cast<const MCSectionELF&>(Section);
+  //return SE.getType() == MCSectionELF::SHT_NOBITS;
+}
+void 
+AMDILAsmBackend::ApplyFixup(const MCFixup &Fixup, char *Data, unsigned DataSize,
+                          uint64_t Value) const
+{
+}
+
+bool 
+AMDILAsmBackend::MayNeedRelaxation(const MCInst &Inst) const
+{
+    return false;
+}
+
+void 
+AMDILAsmBackend::RelaxInstruction(const MCInst &Inst,
+                                       MCInst &Res) const
+{
+}
+
+bool 
+AMDILAsmBackend::WriteNopData(uint64_t Count, MCObjectWriter *OW) const
+{
+  return false;
+}
+
+unsigned
+AMDILAsmBackend::getNumFixupKinds() const
+{
+  return 0;
+}
--- a/src/gallium/drivers/radeon/AMDILAsmBackend.h
+++ b/src/gallium/drivers/radeon/AMDILAsmBackend.h
@ -0,0 +1,49 @@
+//===-- AMDILAsmBackend.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#ifndef _AMDIL_ASM_BACKEND_H_
+#define _AMDIL_ASM_BACKEND_H_
+#include "AMDIL.h"
+#include "llvm/MC/MCAsmBackend.h"
+
+#define ASM_BACKEND_CLASS MCAsmBackend
+
+using namespace llvm;
+namespace llvm {
+  class AMDILAsmBackend : public ASM_BACKEND_CLASS {
+  public:
+    AMDILAsmBackend(const ASM_BACKEND_CLASS &T);
+    virtual MCObjectWriter *createObjectWriter(raw_ostream &OS) const;
+    virtual bool doesSectionRequireSymbols(const MCSection &Section) const;
+    virtual bool isSectionAtomizable(const MCSection &Section) const;
+    virtual bool isVirtualSection(const MCSection &Section) const;
+    virtual void ApplyFixup(const MCFixup &Fixup, char *Data, unsigned DataSize,
+                          uint64_t Value) const;
+    virtual bool
+      MayNeedRelaxation(const MCInst &Inst
+      ) const;
+    virtual void RelaxInstruction(const MCInst &Inst, MCInst &Res) const;
+    virtual bool WriteNopData(uint64_t Count, MCObjectWriter *OW) const;
+    unsigned getNumFixupKinds() const;
+
+  virtual void applyFixup(const MCFixup &Fixup, char * Data, unsigned DataSize,
+                          uint64_t value) const { }
+  virtual bool mayNeedRelaxation(const MCInst &Inst) const { return false; }
+  virtual bool fixupNeedsRelaxation(const MCFixup &fixup, uint64_t value,
+                                    const MCInstFragment *DF,
+                                    const MCAsmLayout &Layout) const
+                                    { return false; }
+  virtual void relaxInstruction(const MCInst &Inst, MCInst &Res) const
+                                {}
+  virtual bool writeNopData(uint64_t data, llvm::MCObjectWriter * writer) const
+  { return false; }
+
+  }; // class AMDILAsmBackend;
+} // llvm namespace
+
+#endif // _AMDIL_ASM_BACKEND_H_
--- a/src/gallium/drivers/radeon/AMDILAsmPrinter7XX.cpp
+++ b/src/gallium/drivers/radeon/AMDILAsmPrinter7XX.cpp
@ -0,0 +1,149 @@
+//===-- AMDILAsmPrinter7XX.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDIL7XXAsmPrinter.h"
+
+#include "AMDILAlgorithms.tpp"
+#include "AMDIL7XXAsmPrinter.h"
+#include "AMDILDevices.h"
+#include "AMDILGlobalManager.h"
+#include "AMDILKernelManager.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Analysis/DebugInfo.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Constants.h"
+#include "llvm/MC/MCAsmInfo.h"
+#include "llvm/MC/MCStreamer.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/Metadata.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/DebugLoc.h"
+#include "llvm/Support/InstIterator.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Type.h"
+
+using namespace llvm;
+
+// TODO: Add support for verbose.
+  AMDIL7XXAsmPrinter::AMDIL7XXAsmPrinter(TargetMachine& TM, MCStreamer &Streamer)
+: AMDILAsmPrinter(TM, Streamer)
+{
+}
+
+AMDIL7XXAsmPrinter::~AMDIL7XXAsmPrinter()
+{
+}
+///
+/// @param name
+/// @brief strips KERNEL_PREFIX and KERNEL_SUFFIX from the name
+/// and returns that name if both of the tokens are present.
+///
+  static
+std::string Strip(const std::string &name)
+{
+  size_t start = name.find("__OpenCL_");
+  size_t end = name.find("_kernel");
+  if (start == std::string::npos
+      || end == std::string::npos
+      || (start == end)) {
+    return name;
+  } else {
+    return name.substr(9, name.length()-16);
+  }
+}
+  void
+AMDIL7XXAsmPrinter::emitMacroFunc(const MachineInstr *MI,
+    llvm::raw_ostream &O)
+{
+  const AMDILSubtarget *curTarget = mTM->getSubtargetImpl();
+  const char *name = "unknown";
+  llvm::StringRef nameRef;
+  if (MI->getOperand(0).isGlobal()) {
+    nameRef = MI->getOperand(0).getGlobal()->getName();
+    name = nameRef.data();
+    if (curTarget->device()->usesHardware(
+          AMDILDeviceInfo::DoubleOps)
+        && !::strncmp(name, "__sqrt_f64", 10) ) {
+      name = "__sqrt_f64_7xx";
+    }
+  }
+  emitMCallInst(MI, O, name);
+}
+
+  bool
+AMDIL7XXAsmPrinter::runOnMachineFunction(MachineFunction &lMF)
+{
+  this->MF = &lMF;
+  mMeta->setMF(&lMF);
+  mMFI = lMF.getInfo<AMDILMachineFunctionInfo>();
+  SetupMachineFunction(lMF);
+  std::string kernelName = MF->getFunction()->getName();
+  mName = Strip(kernelName);
+
+  mKernelName = kernelName;
+  EmitFunctionHeader();
+  EmitFunctionBody();
+  return false;
+}
+
+  void
+AMDIL7XXAsmPrinter::EmitInstruction(const MachineInstr *II)
+{
+  std::string FunStr;
+  raw_string_ostream OFunStr(FunStr);
+  formatted_raw_ostream O(OFunStr);
+  const AMDILSubtarget *curTarget = mTM->getSubtargetImpl();
+  if (mDebugMode) {
+    O << ";" ;
+    II->print(O);
+  }
+   if (isMacroFunc(II)) {
+    emitMacroFunc(II, O);
+    O.flush();
+    OutStreamer.EmitRawText(StringRef(FunStr));
+    return;
+  }
+  if (isMacroCall(II)) {
+    const char *name;
+    name = mTM->getInstrInfo()->getName(II->getOpcode()) + 5;
+    int macronum = amd::MacroDBFindMacro(name);
+    O << "\t;"<< name<<"\n";
+    O << "\tmcall("<<macronum<<")";
+    if (curTarget->device()->isSupported(
+          AMDILDeviceInfo::MacroDB)) {
+      mMacroIDs.insert(macronum);
+    } else {
+      mMFI->addCalledIntr(macronum);
+    }
+  }
+
+  // Print the assembly for the instruction.
+  // We want to make sure that we do HW constants
+  // before we do arena segment
+  if (mMeta->useCompilerWrite(II)) {
+    // TODO: This is a hack to get around some
+    // conformance failures. 
+    O << "\tif_logicalz cb0[0].x\n";
+    O << "\tuav_raw_store_id("
+      << curTarget->device()->getResourceID(AMDILDevice::RAW_UAV_ID)
+      << ") ";
+    O << "mem0.x___, cb0[3].x, r0.0\n";
+    O << "\tendif\n";
+    mMFI->addMetadata(";memory:compilerwrite");
+  } else {
+    printInstruction(II, O);
+  }
+  O.flush();
+  OutStreamer.EmitRawText(StringRef(FunStr));
+}
--- a/src/gallium/drivers/radeon/AMDILAsmPrinterEG.cpp
+++ b/src/gallium/drivers/radeon/AMDILAsmPrinterEG.cpp
@ -0,0 +1,162 @@
+//===-- AMDILAsmPrinterEG.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDILEGAsmPrinter.h"
+
+#include "AMDILAlgorithms.tpp"
+#include "AMDILDevices.h"
+#include "AMDILEGAsmPrinter.h"
+#include "AMDILGlobalManager.h"
+#include "AMDILKernelManager.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Analysis/DebugInfo.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Constants.h"
+#include "llvm/MC/MCAsmInfo.h"
+#include "llvm/MC/MCStreamer.h"
+#include "llvm/MC/MCSymbol.h"
+#include "llvm/Metadata.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/DebugLoc.h"
+#include "llvm/Support/InstIterator.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Type.h"
+
+using namespace llvm;
+
+
+// TODO: Add support for verbose.
+AMDILEGAsmPrinter::AMDILEGAsmPrinter(TargetMachine& TM, MCStreamer &Streamer)
+: AMDILAsmPrinter(TM, Streamer)
+{
+}
+
+AMDILEGAsmPrinter::~AMDILEGAsmPrinter()
+{
+}
+//
+// @param name
+// @brief strips KERNEL_PREFIX and KERNEL_SUFFIX from the name
+// and returns that name if both of the tokens are present.
+//
+  static
+std::string Strip(const std::string &name)
+{
+  size_t start = name.find("__OpenCL_");
+  size_t end = name.find("_kernel");
+  if (start == std::string::npos
+      || end == std::string::npos
+      || (start == end)) {
+    return name;
+  } else {
+    return name.substr(9, name.length()-16);
+  }
+}
+void
+AMDILEGAsmPrinter::emitMacroFunc(const MachineInstr *MI,
+    llvm::raw_ostream &O)
+{
+  const AMDILSubtarget *curTarget = mTM->getSubtargetImpl();
+  const char *name = "unknown";
+  llvm::StringRef nameRef;
+  if (MI->getOperand(0).isGlobal()) {
+    nameRef = MI->getOperand(0).getGlobal()->getName();
+    name = nameRef.data();
+  }
+  if (!::strncmp(name, "__fma_f32", 9) && curTarget->device()->usesHardware(
+        AMDILDeviceInfo::FMA)) {
+    name = "__hwfma_f32";
+  }
+  emitMCallInst(MI, O, name);
+}
+
+  bool
+AMDILEGAsmPrinter::runOnMachineFunction(MachineFunction &lMF)
+{
+  this->MF = &lMF;
+  mMeta->setMF(&lMF);
+  mMFI = lMF.getInfo<AMDILMachineFunctionInfo>();
+  SetupMachineFunction(lMF);
+  std::string kernelName = MF->getFunction()->getName();
+  mName = Strip(kernelName);
+
+  mKernelName = kernelName;
+  EmitFunctionHeader();
+  EmitFunctionBody();
+  return false;
+}
+  void
+AMDILEGAsmPrinter::EmitInstruction(const MachineInstr *II)
+{
+  std::string FunStr;
+  raw_string_ostream OFunStr(FunStr);
+  formatted_raw_ostream O(OFunStr);
+  const AMDILSubtarget *curTarget = mTM->getSubtargetImpl();
+  if (mDebugMode) {
+    O << ";" ;
+    II->print(O);
+  }
+   if (isMacroFunc(II)) {
+    emitMacroFunc(II, O);
+    O.flush();
+    OutStreamer.EmitRawText(StringRef(FunStr));
+    return;
+  }
+  if (isMacroCall(II)) {
+    const char *name;
+    name = mTM->getInstrInfo()->getName(II->getOpcode()) + 5;
+    if (!::strncmp(name, "__fma_f32", 9)
+        && curTarget->device()->usesHardware(
+          AMDILDeviceInfo::FMA)) {
+      name = "__hwfma_f32";
+    }
+    //assert(0 &&
+    //"Found a macro that is still in use!");
+    int macronum = amd::MacroDBFindMacro(name);
+    O << "\t;"<< name<<"\n";
+    O << "\tmcall("<<macronum<<")";
+    if (curTarget->device()->isSupported(
+          AMDILDeviceInfo::MacroDB)) {
+      mMacroIDs.insert(macronum);
+    } else {
+      mMFI->addCalledIntr(macronum);
+    }
+  }
+
+  // Print the assembly for the instruction.
+  // We want to make sure that we do HW constants
+  // before we do arena segment
+  // TODO: This is a hack to get around some
+  // conformance failures. 
+  if (mMeta->useCompilerWrite(II)) {
+    O << "\tif_logicalz cb0[0].x\n";
+    if (mMFI->usesMem(AMDILDevice::RAW_UAV_ID)) {
+      O << "\tuav_raw_store_id("
+        << curTarget->device()->getResourceID(AMDILDevice::RAW_UAV_ID)
+        << ") ";
+      O << "mem0.x___, cb0[3].x, r0.0\n";
+    } else {
+      O << "\tuav_arena_store_id("
+        << curTarget->device()->getResourceID(AMDILDevice::ARENA_UAV_ID)
+        << ")_size(dword) ";
+      O << "cb0[3].x, r0.0\n";
+    }
+    O << "\tendif\n";
+    mMFI->addMetadata(";memory:compilerwrite");
+  } else {
+    printInstruction(II, O);
+  }
+  O.flush();
+  OutStreamer.EmitRawText(StringRef(FunStr));
+}
--- a/src/gallium/drivers/radeon/AMDILBarrierDetect.cpp
+++ b/src/gallium/drivers/radeon/AMDILBarrierDetect.cpp
@ -0,0 +1,254 @@
+//===----- AMDILBarrierDetect.cpp - Barrier Detect pass -*- C++ -*- ------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+#define DEBUG_TYPE "BarrierDetect"
+#ifdef DEBUG
+#define DEBUGME (DebugFlag && isCurrentDebugType(DEBUG_TYPE))
+#else
+#define DEBUGME 0
+#endif
+#include "AMDILAlgorithms.tpp"
+#include "AMDILCompilerWarnings.h"
+#include "AMDILDevices.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILSubtarget.h"
+#include "AMDILTargetMachine.h"
+#include "llvm/BasicBlock.h"
+#include "llvm/CodeGen/MachineFunctionAnalysis.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Function.h"
+#include "llvm/Instructions.h"
+#include "llvm/Module.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+
+// The barrier detect pass determines if a barrier has been duplicated in the
+// source program which can cause undefined behaviour if more than a single
+// wavefront is executed in a group. This is because LLVM does not have an
+// execution barrier and if this barrier function gets duplicated, undefined
+// behaviour can occur. In order to work around this, we detect the duplicated
+// barrier and then make the work-group execute in a single wavefront mode,
+// essentially making the barrier a no-op.
+
+namespace
+{
+  class LLVM_LIBRARY_VISIBILITY AMDILBarrierDetect : public FunctionPass
+  {
+    TargetMachine &TM;
+    static char ID;
+  public:
+    AMDILBarrierDetect(TargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+    ~AMDILBarrierDetect();
+    const char *getPassName() const;
+    bool runOnFunction(Function &F);
+    bool doInitialization(Module &M);
+    bool doFinalization(Module &M);
+    void getAnalysisUsage(AnalysisUsage &AU) const;
+  private:
+    bool detectBarrier(BasicBlock::iterator *BBI);
+    bool detectMemFence(BasicBlock::iterator *BBI);
+    bool mChanged;
+    SmallVector<int64_t, DEFAULT_VEC_SLOTS> bVecMap;
+    const AMDILSubtarget *mStm;
+
+    // Constants used to define memory type.
+    static const unsigned int LOCAL_MEM_FENCE = 1<<0;
+    static const unsigned int GLOBAL_MEM_FENCE = 1<<1;
+    static const unsigned int REGION_MEM_FENCE = 1<<2;
+  };
+  char AMDILBarrierDetect::ID = 0;
+} // anonymouse namespace
+
+namespace llvm
+{
+  FunctionPass *
+  createAMDILBarrierDetect(TargetMachine &TM AMDIL_OPT_LEVEL_DECL)
+  {
+    return new AMDILBarrierDetect(TM  AMDIL_OPT_LEVEL_VAR);
+  }
+} // llvm namespace
+
+AMDILBarrierDetect::AMDILBarrierDetect(TargetMachine &TM
+                                       AMDIL_OPT_LEVEL_DECL)
+  :
+  FunctionPass(ID),
+  TM(TM)
+{
+}
+
+AMDILBarrierDetect::~AMDILBarrierDetect()
+{
+}
+
+bool AMDILBarrierDetect::detectBarrier(BasicBlock::iterator *BBI)
+{
+  SmallVector<int64_t, DEFAULT_VEC_SLOTS>::iterator bIter;
+  int64_t bID;
+  Instruction *inst = (*BBI);
+  CallInst *CI = dyn_cast<CallInst>(inst);
+
+  if (!CI || !CI->getNumOperands()) {
+    return false;
+  }
+  const Value *funcVal = CI->getOperand(CI->getNumOperands() - 1);
+  if (funcVal && strncmp(funcVal->getName().data(), "__amd_barrier", 13)) {
+    return false;
+  }
+
+  if (inst->getNumOperands() >= 3) {
+    const Value *V = inst->getOperand(0);
+    const ConstantInt *Cint = dyn_cast<ConstantInt>(V);
+    bID = Cint->getSExtValue();
+    bIter = std::find(bVecMap.begin(), bVecMap.end(), bID);
+    if (bIter == bVecMap.end()) {
+      bVecMap.push_back(bID);
+    } else {
+      if (mStm->device()->isSupported(AMDILDeviceInfo::BarrierDetect)) {
+        AMDILMachineFunctionInfo *MFI =
+          getAnalysis<MachineFunctionAnalysis>().getMF()
+          .getInfo<AMDILMachineFunctionInfo>();
+        MFI->addMetadata(";limitgroupsize");
+        MFI->addErrorMsg(amd::CompilerWarningMessage[BAD_BARRIER_OPT]);
+      }
+    }
+  }
+  if (mStm->device()->getGeneration() == AMDILDeviceInfo::HD4XXX) {
+    AMDILMachineFunctionInfo *MFI =
+      getAnalysis<MachineFunctionAnalysis>().getMF()
+          .getInfo<AMDILMachineFunctionInfo>();
+    MFI->addErrorMsg(amd::CompilerWarningMessage[LIMIT_BARRIER]);
+    MFI->addMetadata(";limitgroupsize");
+    MFI->setUsesLocal();
+  }
+  const Value *V = inst->getOperand(inst->getNumOperands()-2);
+  const ConstantInt *Cint = dyn_cast<ConstantInt>(V);
+  Function *iF = dyn_cast<Function>(inst->getOperand(inst->getNumOperands()-1));
+  Module *M = iF->getParent();
+  bID = Cint->getSExtValue();
+  if (bID > 0) {
+    const char *name = "barrier";
+    if (bID == GLOBAL_MEM_FENCE) {
+      name = "barrierGlobal";
+    } else if (bID == LOCAL_MEM_FENCE
+        && mStm->device()->usesHardware(AMDILDeviceInfo::LocalMem)) {
+      name = "barrierLocal";
+    } else if (bID == REGION_MEM_FENCE
+               && mStm->device()->usesHardware(AMDILDeviceInfo::RegionMem)) {
+      name = "barrierRegion";
+    }
+    Function *nF =
+      dyn_cast<Function>(M->getOrInsertFunction(name, iF->getFunctionType()));
+    inst->setOperand(inst->getNumOperands()-1, nF);
+    return false;
+  }
+
+  return false;
+}
+
+bool AMDILBarrierDetect::detectMemFence(BasicBlock::iterator *BBI)
+{
+  int64_t bID;
+  Instruction *inst = (*BBI);
+  CallInst *CI = dyn_cast<CallInst>(inst);
+
+  if (!CI || CI->getNumOperands() != 2) {
+    return false;
+  }
+
+  const Value *V = inst->getOperand(inst->getNumOperands()-2);
+  const ConstantInt *Cint = dyn_cast<ConstantInt>(V);
+  Function *iF = dyn_cast<Function>(inst->getOperand(inst->getNumOperands()-1));
+
+  const char *fence_local_name;
+  const char *fence_global_name;
+  const char *fence_region_name;
+  const char* fence_name = "mem_fence";
+  if (!iF) {
+    return false;
+  }
+
+  if (strncmp(iF->getName().data(), "mem_fence", 9) == 0) {
+    fence_local_name = "mem_fence_local";
+    fence_global_name = "mem_fence_global";
+    fence_region_name = "mem_fence_region";
+  } else if (strncmp(iF->getName().data(), "read_mem_fence", 14) == 0) {
+    fence_local_name = "read_mem_fence_local";
+    fence_global_name = "read_mem_fence_global";
+    fence_region_name = "read_mem_fence_region";
+  } else if (strncmp(iF->getName().data(), "write_mem_fence", 15) == 0) {
+    fence_local_name = "write_mem_fence_local";
+    fence_global_name = "write_mem_fence_global";
+    fence_region_name = "write_mem_fence_region";
+  } else {
+    return false;
+  }
+
+  Module *M = iF->getParent();
+  bID = Cint->getSExtValue();
+  if (bID > 0) {
+    const char *name = fence_name;
+    if (bID == GLOBAL_MEM_FENCE) {
+      name = fence_global_name;
+    } else if (bID == LOCAL_MEM_FENCE
+        && mStm->device()->usesHardware(AMDILDeviceInfo::LocalMem)) {
+      name = fence_local_name;
+    } else if (bID == REGION_MEM_FENCE
+               && mStm->device()->usesHardware(AMDILDeviceInfo::RegionMem)) {
+      name = fence_region_name;
+    }
+    Function *nF =
+      dyn_cast<Function>(M->getOrInsertFunction(name, iF->getFunctionType()));
+    inst->setOperand(inst->getNumOperands()-1, nF);
+    return false;
+  }
+
+  return false;
+
+}
+
+bool AMDILBarrierDetect::runOnFunction(Function &MF)
+{
+  mChanged = false;
+  bVecMap.clear();
+  mStm = &TM.getSubtarget<AMDILSubtarget>();
+  Function *F = &MF;
+  safeNestedForEach(F->begin(), F->end(), F->begin()->begin(),
+      std::bind1st(
+        std::mem_fun(
+          &AMDILBarrierDetect::detectBarrier), this));
+  safeNestedForEach(F->begin(), F->end(), F->begin()->begin(),
+      std::bind1st(
+        std::mem_fun(
+          &AMDILBarrierDetect::detectMemFence), this));
+  return mChanged;
+}
+
+const char* AMDILBarrierDetect::getPassName() const
+{
+  return "AMDIL Barrier Detect Pass";
+}
+
+bool AMDILBarrierDetect::doInitialization(Module &M)
+{
+  return false;
+}
+
+bool AMDILBarrierDetect::doFinalization(Module &M)
+{
+  return false;
+}
+
+void AMDILBarrierDetect::getAnalysisUsage(AnalysisUsage &AU) const
+{
+  AU.addRequired<MachineFunctionAnalysis>();
+  FunctionPass::getAnalysisUsage(AU);
+  AU.setPreservesAll();
+}
--- a/src/gallium/drivers/radeon/AMDILBase.td
+++ b/src/gallium/drivers/radeon/AMDILBase.td
@ -0,0 +1,104 @@
+//===- AMDIL.td - AMDIL Target Machine -------------*- tablegen -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+// Target-independent interfaces which we are implementing
+//===----------------------------------------------------------------------===//
+
+include "llvm/Target/Target.td"
+
+//===----------------------------------------------------------------------===//
+// AMDIL Subtarget features.
+//===----------------------------------------------------------------------===//
+def FeatureFP64     : SubtargetFeature<"fp64",
+        "CapsOverride[AMDILDeviceInfo::DoubleOps]",
+        "true",
+        "Enable 64bit double precision operations">;
+def FeatureByteAddress    : SubtargetFeature<"byte_addressable_store",
+        "CapsOverride[AMDILDeviceInfo::ByteStores]",
+        "true",
+        "Enable byte addressable stores">;
+def FeatureBarrierDetect : SubtargetFeature<"barrier_detect",
+        "CapsOverride[AMDILDeviceInfo::BarrierDetect]",
+        "true",
+        "Enable duplicate barrier detection(HD5XXX or later).">;
+def FeatureImages : SubtargetFeature<"images",
+        "CapsOverride[AMDILDeviceInfo::Images]",
+        "true",
+        "Enable image functions">;
+def FeatureMultiUAV : SubtargetFeature<"multi_uav",
+        "CapsOverride[AMDILDeviceInfo::MultiUAV]",
+        "true",
+        "Generate multiple UAV code(HD5XXX family or later)">;
+def FeatureMacroDB : SubtargetFeature<"macrodb",
+        "CapsOverride[AMDILDeviceInfo::MacroDB]",
+        "true",
+        "Use internal macrodb, instead of macrodb in driver">;
+def FeatureNoAlias : SubtargetFeature<"noalias",
+        "CapsOverride[AMDILDeviceInfo::NoAlias]",
+        "true",
+        "assert that all kernel argument pointers are not aliased">;
+def FeatureNoInline : SubtargetFeature<"no-inline",
+        "CapsOverride[AMDILDeviceInfo::NoInline]",
+        "true",
+        "specify whether to not inline functions">;
+
+def Feature64BitPtr : SubtargetFeature<"64BitPtr",
+        "mIs64bit",
+        "false",
+        "Specify if 64bit addressing should be used.">;
+
+def Feature32on64BitPtr : SubtargetFeature<"64on32BitPtr",
+        "mIs32on64bit",
+        "false",
+        "Specify if 64bit sized pointers with 32bit addressing should be used.">;
+def FeatureDebug : SubtargetFeature<"debug",
+        "CapsOverride[AMDILDeviceInfo::Debug]",
+        "true",
+        "Debug mode is enabled, so disable hardware accelerated address spaces.">;
+
+//===----------------------------------------------------------------------===//
+// Register File, Calling Conv, Instruction Descriptions
+//===----------------------------------------------------------------------===//
+
+
+include "AMDILRegisterInfo.td"
+include "AMDILCallingConv.td"
+include "AMDILInstrInfo.td"
+
+def AMDILInstrInfo : InstrInfo {}
+
+//===----------------------------------------------------------------------===//
+// AMDIL processors supported.
+//===----------------------------------------------------------------------===//
+//include "Processors.td"
+
+//===----------------------------------------------------------------------===//
+// Declare the target which we are implementing
+//===----------------------------------------------------------------------===//
+def AMDILAsmWriter : AsmWriter {
+    string AsmWriterClassName = "AsmPrinter";
+    int Variant = 0;
+}
+
+def AMDILAsmParser : AsmParser {
+    string AsmParserClassName = "AsmParser";
+    int Variant = 0;
+
+    string CommentDelimiter = ";";
+
+    string RegisterPrefix = "r";
+
+}
+
+
+def AMDIL : Target {
+  // Pull in Instruction Info:
+  let InstructionSet = AMDILInstrInfo;
+  let AssemblyWriters = [AMDILAsmWriter];
+  let AssemblyParsers = [AMDILAsmParser];
+}
--- a/src/gallium/drivers/radeon/AMDILCFGStructurizer.cpp
+++ b/src/gallium/drivers/radeon/AMDILCFGStructurizer.cpp
--- a/src/gallium/drivers/radeon/AMDILCallingConv.td
+++ b/src/gallium/drivers/radeon/AMDILCallingConv.td
@ -0,0 +1,75 @@
+//===- AMDILCallingConv.td - Calling Conventions AMDIL -----*- tablegen -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This describes the calling conventions for the AMDIL architectures.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Return Value Calling Conventions
+//===----------------------------------------------------------------------===//
+
+// AMDIL 32-bit C return-value convention.
+def RetCC_AMDIL32 : CallingConv<[
+ // Since IL has no return values, all values can be emulated on the stack
+ // The stack can then be mapped to a number of sequential virtual registers
+ // in IL
+
+ // Integer and FP scalar values get put on the stack at 16-byte alignment
+ // but with a size of 4 bytes
+ CCIfType<[i1, i8, i16, i32, f32, f64, i64], CCAssignToReg<
+ [
+ R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >,
+
+ // 2-element Short vector types get 16 byte alignment and size of 8 bytes
+ CCIfType<[v2i32, v2f32, v2i8, v4i8, v2i16, v4i16], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >,
+
+ // 4-element Short vector types get 16 byte alignment and size of 16 bytes
+ CCIfType<[v4i32, v4f32], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >,
+
+ // 2-element 64-bit vector types get aligned to 16 bytes with a size of 16 bytes
+ CCIfType<[v2f64, v2i64], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >, CCAssignToStack<16, 16>
+]>;
+
+// AMDIL 32-bit C Calling convention.
+def CC_AMDIL32 : CallingConv<[
+  // Since IL has parameter values, all values can be emulated on the stack
+ // The stack can then be mapped to a number of sequential virtual registers
+ // in IL
+ // Integer and FP scalar values get put on the stack at 16-byte alignment
+ // but with a size of 4 bytes
+ // Integer and FP scalar values get put on the stack at 16-byte alignment
+ // but with a size of 4 bytes
+ CCIfType<[i1, i8, i16, i32, f32, f64, i64], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >,
+
+ // 2-element Short vector types get 16 byte alignment and size of 8 bytes
+ CCIfType<[v2i32, v2f32, v2i8, v4i8, v2i16, v4i16], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >,
+
+ // 4-element Short vector types get 16 byte alignment and size of 16 bytes
+ CCIfType<[v4i32, v4f32], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >,
+
+ // 2-element 64-bit vector types get aligned to 16 bytes with a size of 16 bytes
+ CCIfType<[v2f64, v2i64], CCAssignToReg<
+[R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11, R12, R13, R14, R15, R16, R17, R18, R19, R20, R21, R22, R23, R24, R25, R26, R27, R28, R29, R30, R31, R32, R33, R34, R35, R36, R37, R38, R39, R40, R41, R42, R43, R44, R45, R46, R47, R48, R49, R50, R51, R52, R53, R54, R55, R56, R57, R58, R59, R60, R61, R62, R63, R64, R65, R66, R67, R68, R69, R70, R71, R72, R73, R74, R75, R76, R77, R78, R79, R80, R81, R82, R83, R84, R85, R86, R87, R88, R89, R90, R91, R92, R93, R94, R95, R96, R97, R98, R99, R100, R101, R102, R103, R104, R105, R106, R107, R108, R109, R110, R111, R112, R113, R114, R115, R116, R117, R118, R119, R120, R121, R122, R123, R124, R125, R126, R127, R128, R129, R130, R131, R132, R133, R134, R135, R136, R137, R138, R139, R140, R141, R142, R143, R144, R145, R146, R147, R148, R149, R150, R151, R152, R153, R154, R155, R156, R157, R158, R159, R160, R161, R162, R163, R164, R165, R166, R167, R168, R169, R170, R171, R172, R173, R174, R175, R176, R177, R178, R179, R180, R181, R182, R183, R184, R185, R186, R187, R188, R189, R190, R191, R192, R193, R194, R195, R196, R197, R198, R199, R200, R201, R202, R203, R204, R205, R206, R207, R208, R209, R210, R211, R212, R213, R214, R215, R216, R217, R218, R219, R220, R221, R222, R223, R224, R225, R226, R227, R228, R229, R230, R231, R232, R233, R234, R235, R236, R237, R238, R239, R240, R241, R242, R243, R244, R245, R246, R247, R248, R249, R250, R251, R252, R253, R254, R255, R256, R257, R258, R259, R260, R261, R262, R263, R264, R265, R266, R267, R268, R269, R270, R271, R272, R273, R274, R275, R276, R277, R278, R279, R280, R281, R282, R283, R284, R285, R286, R287, R288, R289, R290, R291, R292, R293, R294, R295, R296, R297, R298, R299, R300, R301, R302, R303, R304, R305, R306, R307, R308, R309, R310, R311, R312, R313, R314, R315, R316, R317, R318, R319, R320, R321, R322, R323, R324, R325, R326, R327, R328, R329, R330, R331, R332, R333, R334, R335, R336, R337, R338, R339, R340, R341, R342, R343, R344, R345, R346, R347, R348, R349, R350, R351, R352, R353, R354, R355, R356, R357, R358, R359, R360, R361, R362, R363, R364, R365, R366, R367, R368, R369, R370, R371, R372, R373, R374, R375, R376, R377, R378, R379, R380, R381, R382, R383, R384, R385, R386, R387, R388, R389, R390, R391, R392, R393, R394, R395, R396, R397, R398, R399, R400, R401, R402, R403, R404, R405, R406, R407, R408, R409, R410, R411, R412, R413, R414, R415, R416, R417, R418, R419, R420, R421, R422, R423, R424, R425, R426, R427, R428, R429, R430, R431, R432, R433, R434, R435, R436, R437, R438, R439, R440, R441, R442, R443, R444, R445, R446, R447, R448, R449, R450, R451, R452, R453, R454, R455, R456, R457, R458, R459, R460, R461, R462, R463, R464, R465, R466, R467, R468, R469, R470, R471, R472, R473, R474, R475, R476, R477, R478, R479, R480, R481, R482, R483, R484, R485, R486, R487, R488, R489, R490, R491, R492, R493, R494, R495, R496, R497, R498, R499, R500, R501, R502, R503, R504, R505, R506, R507, R508, R509, R510, R511, R512, R513, R514, R515, R516, R517, R518, R519, R520, R521, R522, R523, R524, R525, R526, R527, R528, R529, R530, R531, R532, R533, R534, R535, R536, R537, R538, R539, R540, R541, R542, R543, R544, R545, R546, R547, R548, R549, R550, R551, R552, R553, R554, R555, R556, R557, R558, R559, R560, R561, R562, R563, R564, R565, R566, R567, R568, R569, R570, R571, R572, R573, R574, R575, R576, R577, R578, R579, R580, R581, R582, R583, R584, R585, R586, R587, R588, R589, R590, R591, R592, R593, R594, R595, R596, R597, R598, R599, R600, R601, R602, R603, R604, R605, R606, R607, R608, R609, R610, R611, R612, R613, R614, R615, R616, R617, R618, R619, R620, R621, R622, R623, R624, R625, R626, R627, R628, R629, R630, R631, R632, R633, R634, R635, R636, R637, R638, R639, R640, R641, R642, R643, R644, R645, R646, R647, R648, R649, R650, R651, R652, R653, R654, R655, R656, R657, R658, R659, R660, R661, R662, R663, R664, R665, R666, R667, R668, R669, R670, R671, R672, R673, R674, R675, R676, R677, R678, R679, R680, R681, R682, R683, R684, R685, R686, R687, R688, R689, R690, R691, R692, R693, R694, R695, R696, R697, R698, R699, R700, R701, R702, R703, R704, R705, R706, R707, R708, R709, R710, R711, R712, R713, R714, R715, R716, R717, R718, R719, R720, R721, R722, R723, R724, R725, R726, R727, R728, R729, R730, R731, R732, R733, R734, R735, R736, R737, R738, R739, R740, R741, R742, R743, R744, R745, R746, R747, R748, R749, R750, R751, R752, R753, R754, R755, R756, R757, R758, R759, R760, R761, R762, R763, R764, R765, R766, R767
+]> >, CCAssignToStack<16, 16>
+]>;
+
--- a/src/gallium/drivers/radeon/AMDILCodeEmitter.h
+++ b/src/gallium/drivers/radeon/AMDILCodeEmitter.h
@ -0,0 +1,46 @@
+//                     The LLVM Compiler Infrastructure
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===-- AMDILCodeEmitter.h - TODO: Add brief description -------===//
+//===-- AMDILCodeEmitter.h - TODO: Add brief description -------===//
+//===-- AMDILCodeEmitter.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+
+#ifndef AMDILCODEEMITTER_H
+#define AMDILCODEEMITTER_H
+
+namespace llvm {
+
+  /* XXX: Temp HACK to work around tablegen name generation */
+  class AMDILCodeEmitter {
+  public:
+    uint64_t getBinaryCodeForInstr(const MachineInstr &MI) const;
+    virtual uint64_t getMachineOpValue(const MachineInstr &MI,
+                                   const MachineOperand &MO) const { return 0; }
+    virtual unsigned GPR4AlignEncode(const MachineInstr  &MI,
+                                     unsigned OpNo) const {
+      return 0;
+    }
+    virtual unsigned GPR2AlignEncode(const MachineInstr &MI,
+                                     unsigned OpNo) const {
+      return 0;
+    }
+    virtual uint64_t VOPPostEncode(const MachineInstr &MI,
+                                   uint64_t Value) const {
+      return Value;
+    }
+    virtual uint64_t i32LiteralEncode(const MachineInstr &MI,
+                                      unsigned OpNo) const {
+      return 0;
+    }
+  };
+
+} // End namespace llvm
+
+#endif // AMDILCODEEMITTER_H
--- a/src/gallium/drivers/radeon/AMDILCompilerErrors.h
+++ b/src/gallium/drivers/radeon/AMDILCompilerErrors.h
@ -0,0 +1,75 @@
+//===-- AMDILCompilerErrors.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#ifndef _AMDIL_COMPILER_ERRORS_H_
+#define _AMDIL_COMPILER_ERRORS_H_
+// Compiler errors generated by the backend that will cause
+// the runtime to abort compilation. These are mainly for
+// device constraint violations or invalid code.
+namespace amd {
+
+#define INVALID_COMPUTE 0
+#define GENERIC_ERROR 1
+#define INTERNAL_ERROR 2
+#define MISSING_FUNCTION_CALL 3
+#define RESERVED_FUNCTION 4
+#define BYTE_STORE_ERROR 5
+#define UNKNOWN_TYPE_NAME 6
+#define NO_IMAGE_SUPPORT 7
+#define NO_ATOMIC_32 8
+#define NO_ATOMIC_64 9
+#define IRREDUCIBLE_CF 10
+#define INSUFFICIENT_RESOURCES 11
+#define INSUFFICIENT_LOCAL_RESOURCES 12
+#define INSUFFICIENT_PRIVATE_RESOURCES 13
+#define INSUFFICIENT_IMAGE_RESOURCES 14
+#define DOUBLE_NOT_SUPPORTED 15
+#define INVALID_CONSTANT_WRITE 16
+#define INSUFFICIENT_CONSTANT_RESOURCES 17
+#define INSUFFICIENT_COUNTER_RESOURCES 18
+#define INSUFFICIENT_REGION_RESOURCES 19
+#define REGION_MEMORY_ERROR 20
+#define MEMOP_NO_ALLOCATION 21
+#define RECURSIVE_FUNCTION 22
+#define INCORRECT_COUNTER_USAGE 23
+#define INVALID_INTRINSIC_USAGE 24
+#define NUM_ERROR_MESSAGES 25
+
+
+  static const char *CompilerErrorMessage[NUM_ERROR_MESSAGES] =
+  {
+    "E000:Compute Shader Not Supported!   ",
+    "E001:Generic Compiler Error Message! ",
+    "E002:Internal Compiler Error Message!",
+    "E003:Missing Function Call Detected! ",
+    "E004:Reserved Function Call Detected!",
+    "E005:Byte Addressable Stores Invalid!",
+    "E006:Kernel Arg Type Name Is Invalid!",
+    "E007:Image 1.0 Extension Unsupported!",
+    "E008:32bit Atomic Op are Unsupported!",
+    "E009:64bit Atomic Op are Unsupported!",
+    "E010:Irreducible ControlFlow Detected",
+    "E011:Insufficient Resources Detected!",
+    "E012:Insufficient Local Resources!   ",
+    "E013:Insufficient Private Resources! ",
+    "E014:Images not currently supported! ",
+    "E015:Double precision not supported! ",
+    "E016:Invalid Constant Memory Write!  ",
+    "E017:Max number Constant Ptr reached!",
+    "E018:Max number of Counters reached! ",
+    "E019:Insufficient Region Resources!  ",
+    "E020:Region address space invalid!   ",
+    "E021:MemOp with no memory allocated! ",
+    "E022:Recursive Function detected!    ",
+    "E023:Illegal Inc+Dec to same counter!",
+    "E024:Illegal usage of intrinsic inst!"
+  };
+
+}
+
+#endif // _AMDIL_COMPILER_ERRORS_H_
--- a/src/gallium/drivers/radeon/AMDILCompilerWarnings.h
+++ b/src/gallium/drivers/radeon/AMDILCompilerWarnings.h
@ -0,0 +1,31 @@
+//===-- AMDILCompilerWarnings.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#ifndef _AMDIL_COMPILER_WARNINGS_H_
+#define _AMDIL_COMPILER_WARNINGS_H_
+/// Compiler backend generated warnings that might cause
+/// issues with compilation. These warnings become errors if
+/// -Werror is specified on the command line.
+namespace amd {
+
+#define LIMIT_BARRIER 0
+#define BAD_BARRIER_OPT 1
+#define RECOVERABLE_ERROR 2
+#define NUM_WARN_MESSAGES 3
+    /// All warnings must be prefixed with the W token or they might be
+    /// treated as errors.
+    static const char *CompilerWarningMessage[NUM_WARN_MESSAGES] =
+    {
+        "W000:Barrier caused limited groupsize",
+        "W001:Dangerous Barrier Opt Detected! ",
+        "W002:Recoverable BE Error Detected!  "
+
+    };
+}
+
+#endif // _AMDIL_COMPILER_WARNINGS_H_
--- a/src/gallium/drivers/radeon/AMDILConversions.td
+++ b/src/gallium/drivers/radeon/AMDILConversions.td
--- a/src/gallium/drivers/radeon/AMDILDevice.cpp
+++ b/src/gallium/drivers/radeon/AMDILDevice.cpp
@ -0,0 +1,137 @@
+//===-- AMDILDevice.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDILDevice.h"
+#include "AMDILSubtarget.h"
+
+using namespace llvm;
+// Default implementation for all of the classes.
+AMDILDevice::AMDILDevice(AMDILSubtarget *ST) : mSTM(ST)
+{
+  mHWBits.resize(AMDILDeviceInfo::MaxNumberCapabilities);
+  mSWBits.resize(AMDILDeviceInfo::MaxNumberCapabilities);
+  setCaps();
+  mDeviceFlag = OCL_DEVICE_ALL;
+}
+
+AMDILDevice::~AMDILDevice()
+{
+    mHWBits.clear();
+    mSWBits.clear();
+}
+
+size_t AMDILDevice::getMaxGDSSize() const
+{
+  return 0;
+}
+
+uint32_t 
+AMDILDevice::getDeviceFlag() const
+{
+  return mDeviceFlag;
+}
+
+size_t AMDILDevice::getMaxNumCBs() const
+{
+  if (usesHardware(AMDILDeviceInfo::ConstantMem)) {
+    return HW_MAX_NUM_CB;
+  }
+
+  return 0;
+}
+
+size_t AMDILDevice::getMaxCBSize() const
+{
+  if (usesHardware(AMDILDeviceInfo::ConstantMem)) {
+    return MAX_CB_SIZE;
+  }
+
+  return 0;
+}
+
+size_t AMDILDevice::getMaxScratchSize() const
+{
+  return 65536;
+}
+
+uint32_t AMDILDevice::getStackAlignment() const
+{
+  return 16;
+}
+
+void AMDILDevice::setCaps()
+{
+  mSWBits.set(AMDILDeviceInfo::HalfOps);
+  mSWBits.set(AMDILDeviceInfo::ByteOps);
+  mSWBits.set(AMDILDeviceInfo::ShortOps);
+  mSWBits.set(AMDILDeviceInfo::HW64BitDivMod);
+  if (mSTM->isOverride(AMDILDeviceInfo::NoInline)) {
+    mSWBits.set(AMDILDeviceInfo::NoInline);
+  }
+  if (mSTM->isOverride(AMDILDeviceInfo::MacroDB)) {
+    mSWBits.set(AMDILDeviceInfo::MacroDB);
+  }
+  if (mSTM->isOverride(AMDILDeviceInfo::Debug)) {
+    mSWBits.set(AMDILDeviceInfo::ConstantMem);
+  } else {
+    mHWBits.set(AMDILDeviceInfo::ConstantMem);
+  }
+  if (mSTM->isOverride(AMDILDeviceInfo::Debug)) {
+    mSWBits.set(AMDILDeviceInfo::PrivateMem);
+  } else {
+    mHWBits.set(AMDILDeviceInfo::PrivateMem);
+  }
+  if (mSTM->isOverride(AMDILDeviceInfo::BarrierDetect)) {
+    mSWBits.set(AMDILDeviceInfo::BarrierDetect);
+  }
+  mSWBits.set(AMDILDeviceInfo::ByteLDSOps);
+  mSWBits.set(AMDILDeviceInfo::LongOps);
+}
+
+AMDILDeviceInfo::ExecutionMode
+AMDILDevice::getExecutionMode(AMDILDeviceInfo::Caps Caps) const
+{
+  if (mHWBits[Caps]) {
+    assert(!mSWBits[Caps] && "Cannot set both SW and HW caps");
+    return AMDILDeviceInfo::Hardware;
+  }
+
+  if (mSWBits[Caps]) {
+    assert(!mHWBits[Caps] && "Cannot set both SW and HW caps");
+    return AMDILDeviceInfo::Software;
+  }
+
+  return AMDILDeviceInfo::Unsupported;
+
+}
+
+bool AMDILDevice::isSupported(AMDILDeviceInfo::Caps Mode) const
+{
+  return getExecutionMode(Mode) != AMDILDeviceInfo::Unsupported;
+}
+
+bool AMDILDevice::usesHardware(AMDILDeviceInfo::Caps Mode) const
+{
+  return getExecutionMode(Mode) == AMDILDeviceInfo::Hardware;
+}
+
+bool AMDILDevice::usesSoftware(AMDILDeviceInfo::Caps Mode) const
+{
+  return getExecutionMode(Mode) == AMDILDeviceInfo::Software;
+}
+
+std::string
+AMDILDevice::getDataLayout() const
+{
+    return std::string("e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16"
+      "-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:32:32"
+      "-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64"
+      "-v96:128:128-v128:128:128-v192:256:256-v256:256:256"
+      "-v512:512:512-v1024:1024:1024-v2048:2048:2048"
+      "-n8:16:32:64");
+}
--- a/src/gallium/drivers/radeon/AMDILDevice.h
+++ b/src/gallium/drivers/radeon/AMDILDevice.h
@ -0,0 +1,132 @@
+//===---- AMDILDevice.h - Define Device Data for AMDIL -----*- C++ -*------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// Interface for the subtarget data classes.
+//
+//===----------------------------------------------------------------------===//
+// This file will define the interface that each generation needs to
+// implement in order to correctly answer queries on the capabilities of the
+// specific hardware.
+//===----------------------------------------------------------------------===//
+#ifndef _AMDILDEVICEIMPL_H_
+#define _AMDILDEVICEIMPL_H_
+#include "AMDIL.h"
+#include "llvm/ADT/BitVector.h"
+
+namespace llvm {
+  class AMDILSubtarget;
+  class AMDILAsmPrinter;
+  class AMDILIOExpansion;
+  class AMDILPointerManager;
+  class AsmPrinter;
+  class MCStreamer;
+//===----------------------------------------------------------------------===//
+// Interface for data that is specific to a single device
+//===----------------------------------------------------------------------===//
+class AMDILDevice {
+public:
+  AMDILDevice(AMDILSubtarget *ST);
+  virtual ~AMDILDevice();
+
+  // Enum values for the various memory types.
+  enum {
+    RAW_UAV_ID   = 0,
+    ARENA_UAV_ID = 1,
+    LDS_ID       = 2,
+    GDS_ID       = 3,
+    SCRATCH_ID   = 4,
+    CONSTANT_ID  = 5,
+    GLOBAL_ID    = 6,
+    MAX_IDS      = 7
+  } IO_TYPE_IDS;
+
+  // Returns the max LDS size that the hardware supports.  Size is in
+  // bytes.
+  virtual size_t getMaxLDSSize() const = 0;
+
+  // Returns the max GDS size that the hardware supports if the GDS is
+  // supported by the hardware.  Size is in bytes.
+  virtual size_t getMaxGDSSize() const;
+
+  // Returns the max number of hardware constant address spaces that
+  // are supported by this device.
+  virtual size_t getMaxNumCBs() const;
+
+  // Returns the max number of bytes a single hardware constant buffer
+  // can support.  Size is in bytes.
+  virtual size_t getMaxCBSize() const;
+
+  // Returns the max number of bytes allowed by the hardware scratch
+  // buffer.  Size is in bytes.
+  virtual size_t getMaxScratchSize() const;
+
+  // Get the flag that corresponds to the device.
+  virtual uint32_t getDeviceFlag() const;
+
+  // Returns the number of work-items that exist in a single hardware
+  // wavefront.
+  virtual size_t getWavefrontSize() const = 0;
+
+  // Get the generational name of this specific device.
+  virtual uint32_t getGeneration() const = 0;
+
+  // Get the stack alignment of this specific device.
+  virtual uint32_t getStackAlignment() const;
+
+  // Get the resource ID for this specific device.
+  virtual uint32_t getResourceID(uint32_t DeviceID) const = 0;
+
+  // Get the max number of UAV's for this device.
+  virtual uint32_t getMaxNumUAVs() const = 0;
+
+  // Interface to get the IO Expansion pass for each device.
+  virtual FunctionPass* 
+    getIOExpansion(TargetMachine& AMDIL_OPT_LEVEL_DECL) const = 0;
+
+  // Interface to get the Asm printer for each device.
+  virtual AsmPrinter*
+    getAsmPrinter(TargetMachine& TM, MCStreamer &Streamer) const = 0;
+
+  // Interface to get the Pointer manager pass for each device.
+  virtual FunctionPass* 
+    getPointerManager(TargetMachine& AMDIL_OPT_LEVEL_DECL) const = 0;
+
+
+  // API utilizing more detailed capabilities of each family of
+  // cards. If a capability is supported, then either usesHardware or
+  // usesSoftware returned true.  If usesHardware returned true, then
+  // usesSoftware must return false for the same capability.  Hardware
+  // execution means that the feature is done natively by the hardware
+  // and is not emulated by the softare.  Software execution means
+  // that the feature could be done in the hardware, but there is
+  // software that emulates it with possibly using the hardware for
+  // support since the hardware does not fully comply with OpenCL
+  // specs.
+  bool isSupported(AMDILDeviceInfo::Caps Mode) const;
+  bool usesHardware(AMDILDeviceInfo::Caps Mode) const;
+  bool usesSoftware(AMDILDeviceInfo::Caps Mode) const;
+  virtual std::string getDataLayout() const;
+  static const unsigned int MAX_LDS_SIZE_700 = 16384;
+  static const unsigned int MAX_LDS_SIZE_800 = 32768;
+  static const unsigned int WavefrontSize = 64;
+  static const unsigned int HalfWavefrontSize = 32;
+  static const unsigned int QuarterWavefrontSize = 16;
+protected:
+  virtual void setCaps();
+  llvm::BitVector mHWBits;
+  llvm::BitVector mSWBits;
+  AMDILSubtarget *mSTM;
+  uint32_t mDeviceFlag;
+private:
+  AMDILDeviceInfo::ExecutionMode
+  getExecutionMode(AMDILDeviceInfo::Caps Caps) const;
+}; // AMDILDevice
+
+} // namespace llvm
+#endif // _AMDILDEVICEIMPL_H_
--- a/src/gallium/drivers/radeon/AMDILDeviceInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDILDeviceInfo.cpp
@ -0,0 +1,87 @@
+//===-- AMDILDeviceInfo.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDILDevices.h"
+#include "AMDILSubtarget.h"
+
+using namespace llvm;
+namespace llvm {
+    AMDILDevice*
+getDeviceFromName(const std::string &deviceName, AMDILSubtarget *ptr, bool is64bit, bool is64on32bit)
+{
+    if (deviceName.c_str()[2] == '7') {
+        switch (deviceName.c_str()[3]) {
+            case '1':
+                return new AMDIL710Device(ptr);
+            case '7':
+                return new AMDIL770Device(ptr);
+            default:
+                return new AMDIL7XXDevice(ptr);
+        };
+    } else if (deviceName == "cypress") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDILCypressDevice(ptr);
+    } else if (deviceName == "juniper") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDILEvergreenDevice(ptr);
+    } else if (deviceName == "redwood") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+      return new AMDILRedwoodDevice(ptr);
+    } else if (deviceName == "cedar") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDILCedarDevice(ptr);
+    } else if (deviceName == "barts"
+      || deviceName == "turks") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDILNIDevice(ptr);
+    } else if (deviceName == "cayman") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDILCaymanDevice(ptr);
+    } else if (deviceName == "caicos") {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDILNIDevice(ptr);
+    } else if (deviceName == "SI") {
+        return new AMDILSIDevice(ptr);
+    } else {
+#if DEBUG
+      assert(!is64bit && "This device does not support 64bit pointers!");
+      assert(!is64on32bit && "This device does not support 64bit"
+          " on 32bit pointers!");
+#endif
+        return new AMDIL7XXDevice(ptr);
+    }
+}
+}
--- a/src/gallium/drivers/radeon/AMDILDeviceInfo.h
+++ b/src/gallium/drivers/radeon/AMDILDeviceInfo.h
@ -0,0 +1,89 @@
+//===-- AMDILDeviceInfo.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#ifndef _AMDILDEVICEINFO_H_
+#define _AMDILDEVICEINFO_H_
+
+
+#include <string>
+
+namespace llvm
+{
+  class AMDILDevice;
+  class AMDILSubtarget;
+  namespace AMDILDeviceInfo
+  {
+    // Each Capabilities can be executed using a hardware instruction,
+    // emulated with a sequence of software instructions, or not
+    // supported at all.
+    enum ExecutionMode {
+      Unsupported = 0, // Unsupported feature on the card(Default value)
+      Software, // This is the execution mode that is set if the
+      // feature is emulated in software
+      Hardware  // This execution mode is set if the feature exists
+        // natively in hardware
+    };
+
+    // Any changes to this needs to have a corresponding update to the
+    // twiki page GPUMetadataABI
+    enum Caps {
+      HalfOps          = 0x1,  // Half float is supported or not.
+      DoubleOps        = 0x2,  // Double is supported or not.
+      ByteOps          = 0x3,  // Byte(char) is support or not.
+      ShortOps         = 0x4,  // Short is supported or not.
+      LongOps          = 0x5,  // Long is supported or not.
+      Images           = 0x6,  // Images are supported or not.
+      ByteStores       = 0x7,  // ByteStores available(!HD4XXX).
+      ConstantMem      = 0x8,  // Constant/CB memory.
+      LocalMem         = 0x9,  // Local/LDS memory.
+      PrivateMem       = 0xA,  // Scratch/Private/Stack memory.
+      RegionMem        = 0xB,  // OCL GDS Memory Extension.
+      FMA              = 0xC,  // Use HW FMA or SW FMA.
+      ArenaSegment     = 0xD,  // Use for Arena UAV per pointer 12-1023.
+      MultiUAV         = 0xE,  // Use for UAV per Pointer 0-7.
+      Reserved0        = 0xF,  // ReservedFlag
+      NoAlias          = 0x10, // Cached loads.
+      Signed24BitOps   = 0x11, // Peephole Optimization.
+      // Debug mode implies that no hardware features or optimizations
+      // are performned and that all memory access go through a single
+      // uav(Arena on HD5XXX/HD6XXX and Raw on HD4XXX).
+      Debug            = 0x12, // Debug mode is enabled.
+      CachedMem        = 0x13, // Cached mem is available or not.
+      BarrierDetect    = 0x14, // Detect duplicate barriers.
+      Reserved1        = 0x15, // Reserved flag
+      ByteLDSOps       = 0x16, // Flag to specify if byte LDS ops are available.
+      ArenaVectors     = 0x17, // Flag to specify if vector loads from arena work.
+      TmrReg           = 0x18, // Flag to specify if Tmr register is supported.
+      NoInline         = 0x19, // Flag to specify that no inlining should occur.
+      MacroDB          = 0x1A, // Flag to specify that backend handles macrodb.
+      HW64BitDivMod    = 0x1B, // Flag for backend to generate 64bit div/mod.
+      ArenaUAV         = 0x1C, // Flag to specify that arena uav is supported.
+      PrivateUAV       = 0x1D, // Flag to specify that private memory uses uav's.
+      // If more capabilities are required, then
+      // this number needs to be increased.
+      // All capabilities must come before this
+      // number.
+      MaxNumberCapabilities = 0x20
+    };
+    // These have to be in order with the older generations
+    // having the lower number enumerations.
+    enum Generation {
+      HD4XXX = 0, // 7XX based devices.
+      HD5XXX, // Evergreen based devices.
+      HD6XXX, // NI/Evergreen+ based devices.
+      HD7XXX,
+      HDTEST, // Experimental feature testing device.
+      HDNUMGEN
+    };
+
+
+  } // namespace AMDILDeviceInfo
+  llvm::AMDILDevice*
+    getDeviceFromName(const std::string &name, llvm::AMDILSubtarget *ptr, bool is64bit = false, bool is64on32bit = false);
+} // namespace llvm
+#endif // _AMDILDEVICEINFO_H_
--- a/src/gallium/drivers/radeon/AMDILDevices.h
+++ b/src/gallium/drivers/radeon/AMDILDevices.h
@ -0,0 +1,19 @@
+//===-- AMDILDevices.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#ifndef __AMDIL_DEVICES_H_
+#define __AMDIL_DEVICES_H_
+// Include all of the device specific header files
+// This file is for Internal use only!
+#include "AMDIL7XXDevice.h"
+#include "AMDILDevice.h"
+#include "AMDILEvergreenDevice.h"
+#include "AMDILNIDevice.h"
+#include "AMDILSIDevice.h"
+
+#endif // _AMDIL_DEVICES_H_
--- a/src/gallium/drivers/radeon/AMDILEGIOExpansion.cpp
+++ b/src/gallium/drivers/radeon/AMDILEGIOExpansion.cpp
--- a/src/gallium/drivers/radeon/AMDILELFWriterInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDILELFWriterInfo.cpp
@ -0,0 +1,71 @@
+//===-- AMDILELFWriterInfo.cpp - Elf Writer Info for AMDIL ----------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+//   This file implements ELF writer information for the AMDIL backend.
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDILELFWriterInfo.h"
+#include "AMDIL.h"
+#include "llvm/Function.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Target/TargetData.h"
+#include "llvm/Target/TargetELFWriterInfo.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+
+//===----------------------------------------------------------------------===//
+//  Implementation of the AMDILELFWriterInfo class
+//===----------------------------------------------------------------------===//
+AMDILELFWriterInfo::AMDILELFWriterInfo(bool is64bit, bool endian)
+  : TargetELFWriterInfo(is64bit, endian)
+{
+}
+
+AMDILELFWriterInfo::~AMDILELFWriterInfo() {
+}
+
+unsigned AMDILELFWriterInfo::getRelocationType(unsigned MachineRelTy) const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return 0;
+}
+
+bool AMDILELFWriterInfo::hasRelocationAddend() const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return false;
+}
+
+long int AMDILELFWriterInfo::getDefaultAddendForRelTy(unsigned RelTy,
+                                                      long int Modifier) const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return 0;
+}
+
+unsigned AMDILELFWriterInfo::getRelocationTySize(unsigned RelTy) const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return 0;
+}
+
+bool AMDILELFWriterInfo::isPCRelativeRel(unsigned RelTy) const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return false;
+}
+
+unsigned AMDILELFWriterInfo::getAbsoluteLabelMachineRelTy() const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return 0;
+}
+
+long int AMDILELFWriterInfo::computeRelocation(unsigned SymOffset,
+                                               unsigned RelOffset,
+                                               unsigned RelTy) const {
+  assert(0 && "What do we do here? Lets assert an analyze");
+  return 0;
+}
--- a/src/gallium/drivers/radeon/AMDILELFWriterInfo.h
+++ b/src/gallium/drivers/radeon/AMDILELFWriterInfo.h
@ -0,0 +1,54 @@
+//===-- AMDILELFWriterInfo.h - Elf Writer Info for AMDIL ---------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===---------------------------------------------------------------------===//
+//
+//   This file implements ELF writer information for the AMDIL backend.
+//
+//===---------------------------------------------------------------------===//
+#ifndef _AMDIL_ELF_WRITER_INFO_H_
+#define _AMDIL_ELF_WRITER_INFO_H_
+#include "llvm/Target/TargetELFWriterInfo.h"
+
+namespace llvm {
+  class AMDILELFWriterInfo : public TargetELFWriterInfo {
+  public:
+    AMDILELFWriterInfo(bool is64Bit_, bool isLittleEndian_);
+    virtual ~AMDILELFWriterInfo();
+
+    /// getRelocationType - Returns the target specific ELF Relocation type.
+    /// 'MachineRelTy' contains the object code independent relocation type
+    virtual unsigned getRelocationType(unsigned MachineRelTy) const;
+
+    /// 'hasRelocationAddend - True if the target uses and addend in the
+    /// ELF relocation entry.
+    virtual bool hasRelocationAddend() const;
+
+    /// getDefaultAddendForRelTy - Gets the default addend value for a
+    /// relocation entry based on the target ELF relocation type.
+    virtual long int getDefaultAddendForRelTy(unsigned RelTy,
+                                              long int Modifier = 0) const;
+
+    /// getRelTySize - Returns the size of relocatble field in bits
+    virtual unsigned getRelocationTySize(unsigned RelTy) const;
+
+    /// isPCRelativeRel - True if the relocation type is pc relative
+    virtual bool isPCRelativeRel(unsigned RelTy) const;
+
+    /// getJumpTableRelocationTy - Returns the machine relocation type used
+    /// to reference a jumptable.
+    virtual unsigned getAbsoluteLabelMachineRelTy() const;
+
+    /// computeRelocation - Some relocatable fields could be relocated
+    /// directly, avoiding the relocation symbol emission, compute the
+    /// final relocation value for this symbol.
+    virtual long int computeRelocation(unsigned SymOffset,
+                                       unsigned RelOffset,
+                                       unsigned RelTy) const;
+  };
+} // namespace llvm
+#endif // _AMDIL_ELF_WRITER_INFO_H_
--- a/src/gallium/drivers/radeon/AMDILEnumeratedTypes.td
+++ b/src/gallium/drivers/radeon/AMDILEnumeratedTypes.td
@ -0,0 +1,522 @@
+//===-- AMDILEnumeratedTypes.td - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// ILEnumreatedTypes.td - The IL Enumerated Types
+//===--------------------------------------------------------------------===//
+
+// Section 5.1  IL Shader
+class ILShader<bits<8> val> {
+    bits<8> Value = val;
+}
+// Table 5-1
+def IL_SHADER_PIXEL : ILShader<0>;
+def IL_SHADER_COMPUTE : ILShader<1>;
+
+// Section 5.2 IL RegType
+class ILRegType<bits<6> val> {
+    bits<6> Value = val;
+}
+// Table 5-2
+def IL_REGTYPE_TEMP      : ILRegType<0>;
+def IL_REGTYPE_WINCOORD  : ILRegType<1>;
+def IL_REGTYPE_CONST_BUF : ILRegType<2>;
+def IL_REGTYPE_LITERAL   : ILRegType<3>;
+def IL_REGTYPE_ITEMP     : ILRegType<4>;
+def IL_REGTYPE_GLOBAL    : ILRegType<5>;
+
+// Section 5.3 IL Component Select
+class ILComponentSelect<bits<3> val, string text> {
+     bits<3> Value = val;
+     string Text = text;
+}
+// Table 5-3
+def IL_COMPSEL_X : ILComponentSelect<0, "x">;
+def IL_COMPSEL_Y : ILComponentSelect<1, "y">;
+def IL_COMPSEL_Z : ILComponentSelect<2, "z">;
+def IL_COMPSEL_W : ILComponentSelect<3, "w">;
+def IL_COMPSEL_0 : ILComponentSelect<4, "0">;
+def IL_COMPSEL_1 : ILComponentSelect<5, "1">;
+
+// Section 5.4 IL Mod Dst Comp
+class ILModDstComp<bits<2> val, string text> {
+    bits<2> Value = val;
+    string Text = text;
+}
+// Table 5-4
+def IL_MODCOMP_NOWRITE : ILModDstComp<0, "_">;
+def IL_MODCOMP_WRITE_X : ILModDstComp<1, "x">;
+def IL_MODCOMP_WRITE_y : ILModDstComp<1, "y">;
+def IL_MODCOMP_WRITE_z : ILModDstComp<1, "z">;
+def IL_MODCOMP_WRITE_w : ILModDstComp<1, "w">;
+def IL_MODCOMP_0       : ILModDstComp<2, "0">;
+def IL_MODCOMP_1       : ILModDstComp<3, "1">;
+
+// Section 5.5 IL Import Usage
+class ILImportUsage<bits<1> val, string usage> {
+    bits<1> Value = val;
+    string Text = usage;
+}
+// Table 5-5
+def IL_IMPORTUSAGE_WINCOORD : ILImportUsage<0, "_usage(wincoord)">;
+
+// Section 5.6 Il Shift Scale
+class ILShiftScale<bits<4> val, string scale> {
+    bits<4> Value = val;
+    string Text = scale;
+}
+
+// Table 5-6
+def IL_SHIFT_NONE   : ILShiftScale<0, "">;
+def IL_SHIFT_X2     : ILShiftScale<1, "_x2">;
+def IL_SHIFT_X4     : ILShiftScale<2, "_x4">;
+def IL_SHIFT_X8     : ILShiftScale<3, "_x8">;
+def IL_SHIFT_D2     : ILShiftScale<4, "_d2">;
+def IL_SHIFT_D4     : ILShiftScale<5, "_d4">;
+def IL_SHIFT_D8     : ILShiftScale<6, "_d8">;
+
+// Section 5.7 IL Divide Component
+class ILDivComp<bits<3> val, string divcomp> {
+    bits<3> Value = val;
+    string Text = divcomp;
+}
+
+// Table 5-7
+def IL_DIVCOMP_NONE : ILDivComp<0, "_divcomp(none)">;
+def IL_DIVCOMP_Y    : ILDivComp<1, "_divcomp(y)">;
+def IL_DIVCOMP_Z    : ILDivComp<2, "_divcomp(z)">;
+def IL_DIVCOMP_W    : ILDivComp<3, "_divcomp(w)">;
+//def IL_DIVCOMP_UNKNOWN : ILDivComp<4, "_divcomp(unknown)">;
+
+// Section 5.8 IL Relational Op
+class ILRelOp<bits<3> val, string op> {
+    bits<3> Value = val;
+    string Text = op;
+}
+
+// Table 5-8
+def IL_RELOP_EQ : ILRelOp<0, "_relop(eq)">;
+def IL_RELOP_NE : ILRelOp<1, "_relop(ne)">;
+def IL_RELOP_GT : ILRelOp<2, "_relop(gt)">;
+def IL_RELOP_GE : ILRelOp<3, "_relop(ge)">;
+def IL_RELOP_LT : ILRelOp<4, "_relop(lt)">;
+def IL_RELOP_LE : ILRelOp<5, "_relop(le)">;
+
+// Section 5.9 IL Zero Op
+class ILZeroOp<bits<3> val, string behavior> {
+    bits<3> Value = val;
+    string Text = behavior;
+}
+
+// Table 5-9
+def IL_ZEROOP_FLTMAX    : ILZeroOp<0, "_zeroop(fltmax)">;
+def IL_ZEROOP_0         : ILZeroOp<1, "_zeroop(zero)">;
+def IL_ZEROOP_INFINITY  : ILZeroOp<2, "_zeroop(infinity)">;
+def IL_ZEROOP_INF_ELSE_MAX : ILZeroOp<3, "_zeroop(inf_else_max)">;
+
+// Section 5.10 IL Cmp Value
+class ILCmpValue<bits<3> val, string num> {
+    bits<3> Value = val;
+    string Text = num;
+}
+
+// Table 5-10
+def IL_CMPVAL_0_0     : ILCmpValue<0, "0.0">;
+def IL_CMPVAL_0_5     : ILCmpValue<1, "0.5">;
+def IL_CMPVAL_1_0     : ILCmpValue<2, "1.0">;
+def IL_CMPVAL_NEG_0_5 : ILCmpValue<3, "-0.5">;
+def IL_CMPVAL_NEG_1_0 : ILCmpValue<4, "-1.0">;
+
+// Section 5.11 IL Addressing
+class ILAddressing<bits<3> val> {
+    bits<3> Value = val;
+}
+
+// Table 5-11
+def IL_ADDR_ABSOLUTE     : ILAddressing<0>;
+def IL_ADDR_RELATIVE     : ILAddressing<1>;
+def IL_ADDR_REG_RELATIVE : ILAddressing<2>;
+
+// Section 5.11 IL Element Format
+class ILElementFormat<bits<5> val> {
+    bits<5> Value = val;
+}
+
+// Table 5-11
+def IL_ELEMENTFORMAT_UNKNOWN : ILElementFormat<0>;
+def IL_ELEMENTFORMAT_SNORM   : ILElementFormat<1>;
+def IL_ELEMENTFORMAT_UNORM   : ILElementFormat<2>;
+def IL_ELEMENTFORMAT_SINT    : ILElementFormat<3>;
+def IL_ELEMENTFORMAT_UINT    : ILElementFormat<4>;
+def IL_ELEMENTFORMAT_FLOAT   : ILElementFormat<5>;
+def IL_ELEMENTFORMAT_SRGB    : ILElementFormat<6>;
+def IL_ELEMENTFORMAT_MIXED   : ILElementFormat<7>;
+def IL_ELEMENTFORMAT_Last    : ILElementFormat<8>;
+
+// Section 5.12 IL Op Code
+class ILOpCode<bits<16> val = -1, string cmd> {
+    bits<16> Value = val;
+    string Text = cmd;
+}
+
+// Table 5-12
+def IL_DCL_CONST_BUFFER         : ILOpCode<0, "dcl_cb">;
+def IL_DCL_INDEXED_TEMP_ARRAY   : ILOpCode<1, "dcl_index_temp_array">;
+def IL_DCL_INPUT                : ILOpCode<2, "dcl_input">;
+def IL_DCL_LITERAL              : ILOpCode<3, "dcl_literal">;
+def IL_DCL_OUTPUT               : ILOpCode<4, "dcl_output">;
+def IL_DCL_RESOURCE             : ILOpCode<5, "dcl_resource">;
+def IL_OP_ABS                   : ILOpCode<6, "abs">;
+def IL_OP_ADD                   : ILOpCode<7, "add">;
+def IL_OP_AND                   : ILOpCode<8, "iand">;
+def IL_OP_BREAK                 : ILOpCode<9, "break">;
+def IL_OP_BREAK_LOGICALNZ       : ILOpCode<10, "break_logicalnz">;
+def IL_OP_BREAK_LOGICALZ        : ILOpCode<11, "break_logicalz">;
+def IL_OP_BREAKC                : ILOpCode<12, "breakc">;
+def IL_OP_CALL                  : ILOpCode<13, "call">;
+def IL_OP_CALL_LOGICALNZ        : ILOpCode<14, "call_logicalnz">;
+def IL_OP_CALL_LOGICALZ         : ILOpCode<15, "call_logicalz">;
+def IL_OP_CASE                  : ILOpCode<16, "case">;
+def IL_OP_CLG                   : ILOpCode<17, "clg">;
+def IL_OP_CMOV                  : ILOpCode<18, "cmov">;
+def IL_OP_CMOV_LOGICAL          : ILOpCode<19, "cmov_logical">;
+def IL_OP_CMP                   : ILOpCode<20, "cmp">;
+def IL_OP_CONTINUE              : ILOpCode<21, "continue">;
+def IL_OP_CONTINUE_LOGICALNZ    : ILOpCode<22, "continue_logicalnz">;
+def IL_OP_CONTINUE_LOGICALZ     : ILOpCode<23, "continue_logicalz">;
+def IL_OP_CONTINUEC             : ILOpCode<24, "continuec">;
+def IL_OP_COS                   : ILOpCode<25, "cos">;
+def IL_OP_COS_VEC               : ILOpCode<26, "cos_vec">;
+def IL_OP_D_2_F                 : ILOpCode<27, "d2f">;
+def IL_OP_D_ADD                 : ILOpCode<28, "dadd">;
+def IL_OP_D_EQ                  : ILOpCode<29, "deq">;
+def IL_OP_D_FRC                 : ILOpCode<30, "dfrac">;
+def IL_OP_D_FREXP               : ILOpCode<31, "dfrexp">;
+def IL_OP_D_GE                  : ILOpCode<32, "dge">;
+def IL_OP_D_LDEXP               : ILOpCode<33, "dldexp">;
+def IL_OP_D_LT                  : ILOpCode<34, "dlt">;
+def IL_OP_D_MAD                 : ILOpCode<35, "dmad">;
+def IL_OP_D_MUL                 : ILOpCode<36, "dmul">;
+def IL_OP_D_NE                  : ILOpCode<37, "dne">;
+def IL_OP_DEFAULT               : ILOpCode<38, "default">;
+def IL_OP_DISCARD_LOGICALNZ     : ILOpCode<39, "discard_logicalnz">;
+def IL_OP_DISCARD_LOGICALZ      : ILOpCode<40, "discard_logicalz">;
+def IL_OP_DIV                   : ILOpCode<41, "div_zeroop(infinity)">;
+def IL_OP_DP2                   : ILOpCode<42, "dp2">;
+def IL_OP_DP3                   : ILOpCode<43, "dp3">;
+def IL_OP_DP4                   : ILOpCode<44, "dp4">;
+def IL_OP_ELSE                  : ILOpCode<45, "else">;
+def IL_OP_END                   : ILOpCode<46, "end">;
+def IL_OP_ENDFUNC               : ILOpCode<47, "endfunc">;
+def IL_OP_ENDIF                 : ILOpCode<48, "endif">;
+def IL_OP_ENDLOOP               : ILOpCode<49, "endloop">;
+def IL_OP_ENDMAIN               : ILOpCode<50, "endmain">;
+def IL_OP_ENDSWITCH             : ILOpCode<51, "endswitch">;
+def IL_OP_EQ                    : ILOpCode<52, "eq">;
+def IL_OP_EXP                   : ILOpCode<53, "exp">;
+def IL_OP_EXP_VEC               : ILOpCode<54, "exp_vec">;
+def IL_OP_F_2_D                 : ILOpCode<55, "f2d">;
+def IL_OP_FLR                   : ILOpCode<56, "flr">;
+def IL_OP_FRC                   : ILOpCode<57, "frc">;
+def IL_OP_FTOI                  : ILOpCode<58, "ftoi">;
+def IL_OP_FTOU                  : ILOpCode<59, "ftou">;
+def IL_OP_FUNC                  : ILOpCode<60, "func">;
+def IL_OP_GE                    : ILOpCode<61, "ge">;
+def IL_OP_I_ADD                 : ILOpCode<62, "iadd">;
+def IL_OP_I_EQ                  : ILOpCode<63, "ieq">;
+def IL_OP_I_GE                  : ILOpCode<64, "ige">;
+def IL_OP_I_LT                  : ILOpCode<65, "ilt">;
+def IL_OP_I_MAD                 : ILOpCode<66, "imad">;
+def IL_OP_I_MAX                 : ILOpCode<67, "imax">;
+def IL_OP_I_MIN                 : ILOpCode<68, "imin">;
+def IL_OP_I_MUL                 : ILOpCode<69, "imul">;
+def IL_OP_I_MUL_HIGH            : ILOpCode<70, "imul_high">;
+def IL_OP_I_NE                  : ILOpCode<71, "ine">;
+def IL_OP_I_NEGATE              : ILOpCode<72, "inegate">;
+def IL_OP_I_NOT                 : ILOpCode<73, "inot">;
+def IL_OP_I_OR                  : ILOpCode<74, "ior">;
+def IL_OP_I_SHL                 : ILOpCode<75, "ishl">;
+def IL_OP_I_SHR                 : ILOpCode<76, "ishr">;
+def IL_OP_I_XOR                 : ILOpCode<77, "ixor">;
+def IL_OP_IF_LOGICALNZ          : ILOpCode<78, "if_logicalnz">;
+def IL_OP_IF_LOGICALZ           : ILOpCode<79, "if_logicalz">;
+def IL_OP_IFC                   : ILOpCode<80, "ifc">;
+def IL_OP_ITOF                  : ILOpCode<81, "itof">;
+def IL_OP_LN                    : ILOpCode<82, "ln">;
+def IL_OP_LOG                   : ILOpCode<83, "log">;
+def IL_OP_LOG_VEC               : ILOpCode<84, "log_vec">;
+def IL_OP_LOOP                  : ILOpCode<85, "loop">;
+def IL_OP_LT                    : ILOpCode<86, "lt">;
+def IL_OP_MAD                   : ILOpCode<87, "mad_ieee">;
+def IL_OP_MAX                   : ILOpCode<88, "max_ieee">;
+def IL_OP_MIN                   : ILOpCode<89, "min_ieee">;
+def IL_OP_MOD                   : ILOpCode<90, "mod_ieee">;
+def IL_OP_MOV                   : ILOpCode<91, "mov">;
+def IL_OP_MUL_IEEE              : ILOpCode<92, "mul_ieee">;
+def IL_OP_NE                    : ILOpCode<93, "ne">;
+def IL_OP_NRM                   : ILOpCode<94, "nrm_nrm4_zeroop(zero)">;
+def IL_OP_POW                   : ILOpCode<95, "pow">;
+def IL_OP_RCP                   : ILOpCode<96, "rcp">;
+def IL_OP_RET                   : ILOpCode<97, "ret">;
+def IL_OP_RET_DYN               : ILOpCode<98, "ret_dyn">;
+def IL_OP_RET_LOGICALNZ         : ILOpCode<99, "ret_logicalnz">;
+def IL_OP_RET_LOGICALZ          : ILOpCode<100, "ret_logicalz">;
+def IL_OP_RND                   : ILOpCode<101, "rnd">;
+def IL_OP_ROUND_NEAR            : ILOpCode<102, "round_nearest">;
+def IL_OP_ROUND_NEG_INF         : ILOpCode<103, "round_neginf">;
+def IL_OP_ROUND_POS_INF         : ILOpCode<104, "round_plusinf">;
+def IL_OP_ROUND_ZERO            : ILOpCode<105, "round_z">;
+def IL_OP_RSQ                   : ILOpCode<106, "rsq">;
+def IL_OP_RSQ_VEC               : ILOpCode<107, "rsq_vec">;
+def IL_OP_SAMPLE                : ILOpCode<108, "sample">;
+def IL_OP_SAMPLE_L              : ILOpCode<109, "sample_l">;
+def IL_OP_SET                   : ILOpCode<110, "set">;
+def IL_OP_SGN                   : ILOpCode<111, "sgn">;
+def IL_OP_SIN                   : ILOpCode<112, "sin">;
+def IL_OP_SIN_VEC               : ILOpCode<113, "sin_vec">;
+def IL_OP_SUB                   : ILOpCode<114, "sub">;
+def IL_OP_SWITCH                : ILOpCode<115, "switch">;
+def IL_OP_TRC                   : ILOpCode<116, "trc">;
+def IL_OP_U_DIV                 : ILOpCode<117, "udiv">;
+def IL_OP_U_GE                  : ILOpCode<118, "uge">;
+def IL_OP_U_LT                  : ILOpCode<119, "ult">;
+def IL_OP_U_MAD                 : ILOpCode<120, "umad">;
+def IL_OP_U_MAX                 : ILOpCode<121, "umax">;
+def IL_OP_U_MIN                 : ILOpCode<122, "umin">;
+def IL_OP_U_MOD                 : ILOpCode<123, "umod">;
+def IL_OP_U_MUL                 : ILOpCode<124, "umul">;
+def IL_OP_U_MUL_HIGH            : ILOpCode<125, "umul_high">;
+def IL_OP_U_SHR                 : ILOpCode<126, "ushr">;
+def IL_OP_UTOF                  : ILOpCode<127, "utof">;
+def IL_OP_WHILE                 : ILOpCode<128, "whileloop">;
+// SC IL instructions that are not in CAL IL
+def IL_OP_ACOS                  : ILOpCode<129, "acos">;
+def IL_OP_ASIN                  : ILOpCode<130, "asin">;
+def IL_OP_EXN                   : ILOpCode<131, "exn">;
+def IL_OP_UBIT_REVERSE          : ILOpCode<132, "ubit_reverse">;
+def IL_OP_UBIT_EXTRACT          : ILOpCode<133, "ubit_extract">;
+def IL_OP_IBIT_EXTRACT          : ILOpCode<134, "ibit_extract">;
+def IL_OP_SQRT                  : ILOpCode<135, "sqrt">;
+def IL_OP_SQRT_VEC              : ILOpCode<136, "sqrt_vec">;
+def IL_OP_ATAN                  : ILOpCode<137, "atan">;
+def IL_OP_TAN                   : ILOpCode<137, "tan">;
+def IL_OP_D_DIV                 : ILOpCode<138, "ddiv">;
+def IL_OP_F_NEG                 : ILOpCode<139, "mov">;
+def IL_OP_GT                    : ILOpCode<140, "gt">;
+def IL_OP_LE                    : ILOpCode<141, "lt">;
+def IL_OP_DIST                  : ILOpCode<142, "dist">;
+def IL_OP_LEN                   : ILOpCode<143, "len">;
+def IL_OP_MACRO                 : ILOpCode<144, "mcall">;
+def IL_OP_INTR                  : ILOpCode<145, "call">;
+def IL_OP_I_FFB_HI              : ILOpCode<146, "ffb_hi">;
+def IL_OP_I_FFB_LO              : ILOpCode<147, "ffb_lo">;
+def IL_OP_BARRIER               : ILOpCode<148, "fence_threads_memory_lds">;
+def IL_OP_BARRIER_LOCAL         : ILOpCode<149, "fence_threads_lds">;
+def IL_OP_BARRIER_GLOBAL        : ILOpCode<150, "fence_threads_memory">;
+def IL_OP_FENCE                 : ILOpCode<151, "fence_lds_memory">;
+def IL_OP_FENCE_READ_ONLY       : ILOpCode<152, "fence_lds_mem_read_only">;
+def IL_OP_FENCE_WRITE_ONLY      : ILOpCode<153, "fence_lds_mem_write_only">;
+def IL_PSEUDO_INST              : ILOpCode<154, ";Pseudo Op">;
+def IL_OP_UNPACK_0              : ILOpCode<155, "unpack0">;
+def IL_OP_UNPACK_1              : ILOpCode<156, "unpack1">;
+def IL_OP_UNPACK_2              : ILOpCode<157, "unpack2">;
+def IL_OP_UNPACK_3              : ILOpCode<158, "unpack3">;
+def IL_OP_PI_REDUCE             : ILOpCode<159, "pireduce">;
+def IL_OP_IBIT_COUNT            : ILOpCode<160, "icbits">;
+def IL_OP_I_FFB_SGN             : ILOpCode<161, "ffb_shi">;
+def IL_OP_F2U4                  : ILOpCode<162, "f_2_u4">;
+def IL_OP_BIT_ALIGN             : ILOpCode<163, "bitalign">;
+def IL_OP_BYTE_ALIGN            : ILOpCode<164, "bytealign">;
+def IL_OP_U4_LERP               : ILOpCode<165, "u4lerp">;
+def IL_OP_SAD                   : ILOpCode<166, "sad">;
+def IL_OP_SAD_HI                : ILOpCode<167, "sadhi">;
+def IL_OP_SAD4                  : ILOpCode<168, "sad4">;
+def IL_OP_UBIT_INSERT           : ILOpCode<169, "ubit_insert">;
+def IL_OP_I_CARRY               : ILOpCode<170, "icarry">;
+def IL_OP_I_BORROW              : ILOpCode<171, "iborrow">;
+def IL_OP_U_MAD24               : ILOpCode<172, "umad24">;
+def IL_OP_U_MUL24               : ILOpCode<173, "umul24">;
+def IL_OP_I_MAD24               : ILOpCode<174, "imad24">;
+def IL_OP_I_MUL24               : ILOpCode<175, "imul24">;
+def IL_OP_CLAMP                 : ILOpCode<176, "clamp">;
+def IL_OP_LERP                  : ILOpCode<177, "lrp">;
+def IL_OP_FMA                   : ILOpCode<178, "fma">;
+def IL_OP_D_MIN                 : ILOpCode<179, "dmin">;
+def IL_OP_D_MAX                 : ILOpCode<180, "dmax">;
+def IL_OP_D_SQRT                : ILOpCode<181, "dsqrt">;
+def IL_OP_DP2_ADD               : ILOpCode<182, "dp2add">;
+def IL_OP_F16_TO_F32            : ILOpCode<183, "f162f">;
+def IL_OP_F32_TO_F16            : ILOpCode<184, "f2f16">;
+def IL_REG_LOCAL_ID_FLAT        : ILOpCode<185, "vTidInGrpFlat">;
+def IL_REG_LOCAL_ID             : ILOpCode<186, "vTidInGrp">;
+def IL_REG_GLOBAL_ID_FLAT       : ILOpCode<187, "vAbsTidFlag">;
+def IL_REG_GLOBAL_ID            : ILOpCode<188, "vAbsTid">;
+def IL_REG_GROUP_ID_FLAT        : ILOpCode<189, "vThreadGrpIDFlat">;
+def IL_REG_GROUP_ID             : ILOpCode<190, "vThreadGrpID">;
+def IL_OP_D_RCP                 : ILOpCode<191, "drcp_zeroop(infinity)">;
+def IL_OP_D_RSQ                 : ILOpCode<192, "drsq_zeroop(infinity)">;
+def IL_OP_D_MOV                 : ILOpCode<193, "dmov">;
+def IL_OP_D_MOVC                : ILOpCode<194, "dmovc">;
+def IL_OP_NOP                   : ILOpCode<195, "nop">;
+def IL_OP_UAV_ADD               : ILOpCode<196, "uav_add">;
+def IL_OP_UAV_AND               : ILOpCode<197, "uav_and">;
+def IL_OP_UAV_MAX               : ILOpCode<198, "uav_max">;
+def IL_OP_UAV_MIN               : ILOpCode<199, "uav_min">;
+def IL_OP_UAV_OR                : ILOpCode<200, "uav_or">;
+def IL_OP_UAV_RSUB              : ILOpCode<201, "uav_rsub">;
+def IL_OP_UAV_SUB               : ILOpCode<202, "uav_sub">;
+def IL_OP_UAV_UMAX              : ILOpCode<203, "uav_umax">;
+def IL_OP_UAV_UMIN              : ILOpCode<204, "uav_umin">;
+def IL_OP_UAV_XOR               : ILOpCode<205, "uav_xor">;
+def IL_OP_UAV_INC               : ILOpCode<206, "uav_uinc">;
+def IL_OP_UAV_DEC               : ILOpCode<207, "uav_udec">;
+def IL_OP_UAV_CMP               : ILOpCode<208, "uav_cmp">;
+def IL_OP_UAV_READ_ADD          : ILOpCode<209, "uav_read_add">;
+def IL_OP_UAV_READ_AND          : ILOpCode<210, "uav_read_and">;
+def IL_OP_UAV_READ_MAX          : ILOpCode<211, "uav_read_max">;
+def IL_OP_UAV_READ_MIN          : ILOpCode<212, "uav_read_min">;
+def IL_OP_UAV_READ_OR           : ILOpCode<213, "uav_read_or">;
+def IL_OP_UAV_READ_RSUB         : ILOpCode<214, "uav_read_rsub">;
+def IL_OP_UAV_READ_SUB          : ILOpCode<215, "uav_read_sub">;
+def IL_OP_UAV_READ_UMAX         : ILOpCode<216, "uav_read_umax">;
+def IL_OP_UAV_READ_UMIN         : ILOpCode<217, "uav_read_umin">;
+def IL_OP_UAV_READ_XOR          : ILOpCode<218, "uav_read_xor">;
+def IL_OP_UAV_READ_INC          : ILOpCode<219, "uav_read_uinc">;
+def IL_OP_UAV_READ_DEC          : ILOpCode<220, "uav_read_udec">;
+def IL_OP_UAV_READ_XCHG         : ILOpCode<221, "uav_read_xchg">;
+def IL_OP_UAV_READ_CMPXCHG      : ILOpCode<222, "uav_read_cmp_xchg">;
+def IL_OP_LDS_ADD               : ILOpCode<223, "lds_add">;
+def IL_OP_LDS_AND               : ILOpCode<224, "lds_and">;
+def IL_OP_LDS_MAX               : ILOpCode<225, "lds_max">;
+def IL_OP_LDS_MIN               : ILOpCode<226, "lds_min">;
+def IL_OP_LDS_OR                : ILOpCode<227, "lds_or">;
+def IL_OP_LDS_RSUB              : ILOpCode<228, "lds_rsub">;
+def IL_OP_LDS_SUB               : ILOpCode<229, "lds_sub">;
+def IL_OP_LDS_UMAX              : ILOpCode<230, "lds_umax">;
+def IL_OP_LDS_UMIN              : ILOpCode<231, "lds_umin">;
+def IL_OP_LDS_XOR               : ILOpCode<232, "lds_xor">;
+def IL_OP_LDS_INC               : ILOpCode<233, "lds_inc">;
+def IL_OP_LDS_DEC               : ILOpCode<234, "lds_dec">;
+def IL_OP_LDS_CMP               : ILOpCode<235, "lds_cmp">;
+def IL_OP_LDS_READ_ADD          : ILOpCode<236, "lds_read_add">;
+def IL_OP_LDS_READ_AND          : ILOpCode<237, "lds_read_and">;
+def IL_OP_LDS_READ_MAX          : ILOpCode<238, "lds_read_max">;
+def IL_OP_LDS_READ_MIN          : ILOpCode<239, "lds_read_min">;
+def IL_OP_LDS_READ_OR           : ILOpCode<240, "lds_read_or">;
+def IL_OP_LDS_READ_RSUB         : ILOpCode<241, "lds_read_rsub">;
+def IL_OP_LDS_READ_SUB          : ILOpCode<242, "lds_read_sub">;
+def IL_OP_LDS_READ_UMAX         : ILOpCode<243, "lds_read_umax">;
+def IL_OP_LDS_READ_UMIN         : ILOpCode<244, "lds_read_umin">;
+def IL_OP_LDS_READ_XOR          : ILOpCode<245, "lds_read_xor">;
+def IL_OP_LDS_READ_INC          : ILOpCode<246, "lds_read_inc">;
+def IL_OP_LDS_READ_DEC          : ILOpCode<247, "lds_read_dec">;
+def IL_OP_LDS_READ_XCHG         : ILOpCode<248, "lds_read_xchg">;
+def IL_OP_LDS_READ_CMPXCHG      : ILOpCode<249, "lds_read_cmp_xchg">;
+def IL_OP_GDS_ADD               : ILOpCode<250, "gds_add">;
+def IL_OP_GDS_AND               : ILOpCode<251, "gds_and">;
+def IL_OP_GDS_MAX               : ILOpCode<252, "gds_max">;
+def IL_OP_GDS_MIN               : ILOpCode<253, "gds_min">;
+def IL_OP_GDS_OR                : ILOpCode<254, "gds_or">;
+def IL_OP_GDS_RSUB              : ILOpCode<255, "gds_rsub">;
+def IL_OP_GDS_SUB               : ILOpCode<256, "gds_sub">;
+def IL_OP_GDS_UMAX              : ILOpCode<257, "gds_umax">;
+def IL_OP_GDS_UMIN              : ILOpCode<258, "gds_umin">;
+def IL_OP_GDS_MSKOR             : ILOpCode<259, "gds_mskor">;
+def IL_OP_GDS_XOR               : ILOpCode<260, "gds_xor">;
+def IL_OP_GDS_INC               : ILOpCode<261, "gds_inc">;
+def IL_OP_GDS_DEC               : ILOpCode<262, "gds_dec">;
+def IL_OP_GDS_CMP               : ILOpCode<263, "gds_cmp">;
+def IL_OP_GDS_READ_ADD          : ILOpCode<264, "gds_read_add">;
+def IL_OP_GDS_READ_AND          : ILOpCode<265, "gds_read_and">;
+def IL_OP_GDS_READ_MAX          : ILOpCode<266, "gds_read_max">;
+def IL_OP_GDS_READ_MIN          : ILOpCode<267, "gds_read_min">;
+def IL_OP_GDS_READ_OR           : ILOpCode<268, "gds_read_or">;
+def IL_OP_GDS_READ_RSUB         : ILOpCode<269, "gds_read_rsub">;
+def IL_OP_GDS_READ_SUB          : ILOpCode<270, "gds_read_sub">;
+def IL_OP_GDS_READ_UMAX         : ILOpCode<271, "gds_read_umax">;
+def IL_OP_GDS_READ_UMIN         : ILOpCode<272, "gds_read_umin">;
+def IL_OP_GDS_READ_MSKOR        : ILOpCode<273, "gds_read_mskor">;
+def IL_OP_GDS_READ_XOR          : ILOpCode<274, "gds_read_xor">;
+def IL_OP_GDS_READ_INC          : ILOpCode<275, "gds_read_inc">;
+def IL_OP_GDS_READ_DEC          : ILOpCode<276, "gds_read_dec">;
+def IL_OP_GDS_READ_XCHG         : ILOpCode<277, "gds_read_xchg">;
+def IL_OP_GDS_READ_CMPXCHG      : ILOpCode<278, "gds_read_cmp_xchg">;
+def IL_OP_APPEND_BUF_ALLOC      : ILOpCode<279, "append_buf_alloc">;
+def IL_OP_APPEND_BUF_CONSUME    : ILOpCode<280, "append_buf_consume">;
+def IL_OP_I64_ADD               : ILOpCode<281, "i64add">;
+def IL_OP_I64_MAX               : ILOpCode<282, "i64max">;
+def IL_OP_U64_MAX               : ILOpCode<283, "u64max">;
+def IL_OP_I64_MIN               : ILOpCode<284, "i64min">;
+def IL_OP_U64_MIN               : ILOpCode<285, "u64min">;
+def IL_OP_I64_NEGATE            : ILOpCode<286, "i64negate">;
+def IL_OP_I64_SHL               : ILOpCode<287, "i64shl">;
+def IL_OP_I64_SHR               : ILOpCode<288, "i64shr">;
+def IL_OP_U64_SHR               : ILOpCode<289, "u64shr">;
+def IL_OP_I64_EQ                : ILOpCode<290, "i64eq">;
+def IL_OP_I64_GE                : ILOpCode<291, "i64ge">;
+def IL_OP_U64_GE                : ILOpCode<292, "u64ge">;
+def IL_OP_I64_LT                : ILOpCode<293, "i64lt">;
+def IL_OP_U64_LT                : ILOpCode<294, "u64lt">;
+def IL_OP_I64_NE                : ILOpCode<295, "i64ne">;
+def IL_OP_U_MULHI24             : ILOpCode<296, "umul24_high">;
+def IL_OP_I_MULHI24             : ILOpCode<297, "imul24_high">;
+def IL_OP_GDS_LOAD              : ILOpCode<298, "gds_load">;
+def IL_OP_GDS_STORE             : ILOpCode<299, "gds_store">;
+def IL_OP_LDS_LOAD              : ILOpCode<300, "lds_load">;
+def IL_OP_LDS_LOAD_VEC          : ILOpCode<301, "lds_load_vec">;
+def IL_OP_LDS_LOAD_BYTE         : ILOpCode<302, "lds_load_byte">;
+def IL_OP_LDS_LOAD_UBYTE        : ILOpCode<303, "lds_load_ubyte">;
+def IL_OP_LDS_LOAD_SHORT        : ILOpCode<304, "lds_load_short">;
+def IL_OP_LDS_LOAD_USHORT       : ILOpCode<305, "lds_load_ushort">;
+def IL_OP_LDS_STORE             : ILOpCode<306, "lds_store">;
+def IL_OP_LDS_STORE_VEC         : ILOpCode<307, "lds_store_vec">;
+def IL_OP_LDS_STORE_BYTE        : ILOpCode<308, "lds_store_byte">;
+def IL_OP_LDS_STORE_SHORT       : ILOpCode<309, "lds_store_short">;
+def IL_OP_RAW_UAV_LOAD          : ILOpCode<310, "uav_raw_load">;
+def IL_OP_RAW_UAV_STORE         : ILOpCode<311, "uav_raw_store">;
+def IL_OP_ARENA_UAV_LOAD        : ILOpCode<312, "uav_arena_load">;
+def IL_OP_ARENA_UAV_STORE       : ILOpCode<313, "uav_arena_store">;
+def IL_OP_LDS_MSKOR             : ILOpCode<314, "lds_mskor">;
+def IL_OP_LDS_READ_MSKOR        : ILOpCode<315, "lds_read_mskor">;
+def IL_OP_UAV_BYTE_LOAD         : ILOpCode<316, "uav_byte_load">;
+def IL_OP_UAV_UBYTE_LOAD        : ILOpCode<317, "uav_ubyte_load">;
+def IL_OP_UAV_SHORT_LOAD        : ILOpCode<318, "uav_short_load">;
+def IL_OP_UAV_USHORT_LOAD       : ILOpCode<319, "uav_ushort_load">;
+def IL_OP_UAV_BYTE_STORE        : ILOpCode<320, "uav_byte_store">;
+def IL_OP_UAV_SHORT_STORE       : ILOpCode<320, "uav_short_store">;
+def IL_OP_UAV_STORE             : ILOpCode<321, "uav_store">;
+def IL_OP_UAV_LOAD              : ILOpCode<322, "uav_load">;
+def IL_OP_MUL                   : ILOpCode<323, "mul">;
+def IL_OP_DIV_INF               : ILOpCode<324, "div_zeroop(infinity)">;
+def IL_OP_DIV_FLTMAX            : ILOpCode<325, "div_zeroop(fltmax)">;
+def IL_OP_DIV_ZERO              : ILOpCode<326, "div_zeroop(zero)">;
+def IL_OP_DIV_INFELSEMAX        : ILOpCode<327, "div_zeroop(inf_else_max)">;
+def IL_OP_FTOI_FLR              : ILOpCode<328, "ftoi_flr">;
+def IL_OP_FTOI_RPI              : ILOpCode<329, "ftoi_rpi">;
+def IL_OP_F32_TO_F16_NEAR       : ILOpCode<330, "f2f16_near">;
+def IL_OP_F32_TO_F16_NEG_INF    : ILOpCode<331, "f2f16_neg_inf">;
+def IL_OP_F32_TO_F16_PLUS_INF   : ILOpCode<332, "f2f16_plus_inf">;
+def IL_OP_I64_MUL               : ILOpCode<333, "i64mul">;
+def IL_OP_U64_MUL               : ILOpCode<334, "u64mul">;
+def IL_OP_CU_ID                 : ILOpCode<355, "cu_id">;
+def IL_OP_WAVE_ID               : ILOpCode<356, "wave_id">;
+def IL_OP_I64_SUB               : ILOpCode<357, "i64sub">;
+def IL_OP_I64_DIV               : ILOpCode<358, "i64div">;
+def IL_OP_U64_DIV               : ILOpCode<359, "u64div">;
+def IL_OP_I64_MOD               : ILOpCode<360, "i64mod">;
+def IL_OP_U64_MOD               : ILOpCode<361, "u64mod">;
+def IL_DCL_GWS_THREAD_COUNT     : ILOpCode<362, "dcl_gws_thread_count">;
+def IL_DCL_SEMAPHORE            : ILOpCode<363, "dcl_semaphore">;
+def IL_OP_SEMAPHORE_INIT        : ILOpCode<364, "init_semaphore">;
+def IL_OP_SEMAPHORE_WAIT        : ILOpCode<365, "semaphore_wait">;
+def IL_OP_SEMAPHORE_SIGNAL      : ILOpCode<366, "semaphore_signal">;
+def IL_OP_BARRIER_REGION        : ILOpCode<377, "fence_threads_gds">;
+def IL_OP_BFI                   : ILOpCode<394, "bfi">;
+def IL_OP_BFM                   : ILOpCode<395, "bfm">;
+def IL_DBG_STRING               : ILOpCode<396, "dbg_string">;
+def IL_DBG_LINE                 : ILOpCode<397, "dbg_line">;
+def IL_DBG_TEMPLOC              : ILOpCode<398, "dbg_temploc">;
--- a/src/gallium/drivers/radeon/AMDILEvergreenDevice.cpp
+++ b/src/gallium/drivers/radeon/AMDILEvergreenDevice.cpp
@ -0,0 +1,211 @@
+//===-- AMDILEvergreenDevice.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDILEvergreenDevice.h"
+#ifdef UPSTREAM_LLVM
+#include "AMDILEGAsmPrinter.h"
+#endif
+#include "AMDILIOExpansion.h"
+#include "AMDILPointerManager.h"
+
+using namespace llvm;
+
+AMDILEvergreenDevice::AMDILEvergreenDevice(AMDILSubtarget *ST)
+: AMDILDevice(ST) {
+  setCaps();
+  std::string name = ST->getDeviceName();
+  if (name == "cedar") {
+    mDeviceFlag = OCL_DEVICE_CEDAR;
+  } else if (name == "redwood") {
+    mDeviceFlag = OCL_DEVICE_REDWOOD;
+  } else if (name == "cypress") {
+    mDeviceFlag = OCL_DEVICE_CYPRESS;
+  } else {
+    mDeviceFlag = OCL_DEVICE_JUNIPER;
+  }
+}
+
+AMDILEvergreenDevice::~AMDILEvergreenDevice() {
+}
+
+size_t AMDILEvergreenDevice::getMaxLDSSize() const {
+  if (usesHardware(AMDILDeviceInfo::LocalMem)) {
+    return MAX_LDS_SIZE_800;
+  } else {
+    return 0;
+  }
+}
+size_t AMDILEvergreenDevice::getMaxGDSSize() const {
+  if (usesHardware(AMDILDeviceInfo::RegionMem)) {
+    return MAX_LDS_SIZE_800;
+  } else {
+    return 0;
+  }
+}
+uint32_t AMDILEvergreenDevice::getMaxNumUAVs() const {
+  return 12;
+}
+
+uint32_t AMDILEvergreenDevice::getResourceID(uint32_t id) const {
+  switch(id) {
+  default:
+    assert(0 && "ID type passed in is unknown!");
+    break;
+  case CONSTANT_ID:
+  case RAW_UAV_ID:
+    if (mSTM->calVersion() >= CAL_VERSION_GLOBAL_RETURN_BUFFER) {
+      return GLOBAL_RETURN_RAW_UAV_ID;
+    } else {
+      return DEFAULT_RAW_UAV_ID;
+    }
+  case GLOBAL_ID:
+  case ARENA_UAV_ID:
+    return DEFAULT_ARENA_UAV_ID;
+  case LDS_ID:
+    if (usesHardware(AMDILDeviceInfo::LocalMem)) {
+      return DEFAULT_LDS_ID;
+    } else {
+      return DEFAULT_ARENA_UAV_ID;
+    }
+  case GDS_ID:
+    if (usesHardware(AMDILDeviceInfo::RegionMem)) {
+      return DEFAULT_GDS_ID;
+    } else {
+      return DEFAULT_ARENA_UAV_ID;
+    }
+  case SCRATCH_ID:
+    if (usesHardware(AMDILDeviceInfo::PrivateMem)) {
+      return DEFAULT_SCRATCH_ID;
+    } else {
+      return DEFAULT_ARENA_UAV_ID;
+    }
+  };
+  return 0;
+}
+
+size_t AMDILEvergreenDevice::getWavefrontSize() const {
+  return AMDILDevice::WavefrontSize;
+}
+
+uint32_t AMDILEvergreenDevice::getGeneration() const {
+  return AMDILDeviceInfo::HD5XXX;
+}
+
+void AMDILEvergreenDevice::setCaps() {
+  mSWBits.set(AMDILDeviceInfo::ArenaSegment);
+  mHWBits.set(AMDILDeviceInfo::ArenaUAV);
+  if (mSTM->calVersion() >= CAL_VERSION_SC_140) {
+    mHWBits.set(AMDILDeviceInfo::HW64BitDivMod);
+    mSWBits.reset(AMDILDeviceInfo::HW64BitDivMod);
+  } 
+  mSWBits.set(AMDILDeviceInfo::Signed24BitOps);
+  if (mSTM->isOverride(AMDILDeviceInfo::ByteStores)) {
+    mHWBits.set(AMDILDeviceInfo::ByteStores);
+  }
+  if (mSTM->isOverride(AMDILDeviceInfo::Debug)) {
+    mSWBits.set(AMDILDeviceInfo::LocalMem);
+    mSWBits.set(AMDILDeviceInfo::RegionMem);
+  } else {
+    mHWBits.set(AMDILDeviceInfo::LocalMem);
+    mHWBits.set(AMDILDeviceInfo::RegionMem);
+  }
+  mHWBits.set(AMDILDeviceInfo::Images);
+  if (mSTM->isOverride(AMDILDeviceInfo::NoAlias)) {
+    mHWBits.set(AMDILDeviceInfo::NoAlias);
+  }
+  if (mSTM->calVersion() > CAL_VERSION_GLOBAL_RETURN_BUFFER) {
+    mHWBits.set(AMDILDeviceInfo::CachedMem);
+  }
+  if (mSTM->isOverride(AMDILDeviceInfo::MultiUAV)) {
+    mHWBits.set(AMDILDeviceInfo::MultiUAV);
+  }
+  if (mSTM->calVersion() > CAL_VERSION_SC_136) {
+    mHWBits.set(AMDILDeviceInfo::ByteLDSOps);
+    mSWBits.reset(AMDILDeviceInfo::ByteLDSOps);
+    mHWBits.set(AMDILDeviceInfo::ArenaVectors);
+  } else {
+    mSWBits.set(AMDILDeviceInfo::ArenaVectors);
+  }
+  if (mSTM->calVersion() > CAL_VERSION_SC_137) {
+    mHWBits.set(AMDILDeviceInfo::LongOps);
+    mSWBits.reset(AMDILDeviceInfo::LongOps);
+  }
+  mHWBits.set(AMDILDeviceInfo::TmrReg);
+}
+FunctionPass* 
+AMDILEvergreenDevice::getIOExpansion(
+    TargetMachine& TM AMDIL_OPT_LEVEL_DECL) const
+{
+  return new AMDILEGIOExpansion(TM AMDIL_OPT_LEVEL_VAR);
+}
+
+AsmPrinter*
+AMDILEvergreenDevice::getAsmPrinter(TargetMachine& TM, MCStreamer &Streamer) const
+{
+#ifdef UPSTREAM_LLVM
+  return new AMDILEGAsmPrinter(TM, Streamer);
+#else
+  return NULL;
+#endif
+}
+
+FunctionPass*
+AMDILEvergreenDevice::getPointerManager(
+    TargetMachine& TM AMDIL_OPT_LEVEL_DECL) const
+{
+  return new AMDILEGPointerManager(TM AMDIL_OPT_LEVEL_VAR);
+}
+
+AMDILCypressDevice::AMDILCypressDevice(AMDILSubtarget *ST)
+  : AMDILEvergreenDevice(ST) {
+  setCaps();
+}
+
+AMDILCypressDevice::~AMDILCypressDevice() {
+}
+
+void AMDILCypressDevice::setCaps() {
+  if (mSTM->isOverride(AMDILDeviceInfo::DoubleOps)) {
+    mHWBits.set(AMDILDeviceInfo::DoubleOps);
+    mHWBits.set(AMDILDeviceInfo::FMA);
+  }
+}
+
+
+AMDILCedarDevice::AMDILCedarDevice(AMDILSubtarget *ST)
+  : AMDILEvergreenDevice(ST) {
+  setCaps();
+}
+
+AMDILCedarDevice::~AMDILCedarDevice() {
+}
+
+void AMDILCedarDevice::setCaps() {
+  mSWBits.set(AMDILDeviceInfo::FMA);
+}
+
+size_t AMDILCedarDevice::getWavefrontSize() const {
+  return AMDILDevice::QuarterWavefrontSize;
+}
+
+AMDILRedwoodDevice::AMDILRedwoodDevice(AMDILSubtarget *ST)
+  : AMDILEvergreenDevice(ST) {
+  setCaps();
+}
+
+AMDILRedwoodDevice::~AMDILRedwoodDevice()
+{
+}
+
+void AMDILRedwoodDevice::setCaps() {
+  mSWBits.set(AMDILDeviceInfo::FMA);
+}
+
+size_t AMDILRedwoodDevice::getWavefrontSize() const {
+  return AMDILDevice::HalfWavefrontSize;
+}
--- a/src/gallium/drivers/radeon/AMDILEvergreenDevice.h
+++ b/src/gallium/drivers/radeon/AMDILEvergreenDevice.h
@ -0,0 +1,93 @@
+//==- AMDILEvergreenDevice.h - Define Evergreen Device for AMDIL -*- C++ -*--=//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// Interface for the subtarget data classes.
+//
+//===----------------------------------------------------------------------===//
+// This file will define the interface that each generation needs to
+// implement in order to correctly answer queries on the capabilities of the
+// specific hardware.
+//===----------------------------------------------------------------------===//
+#ifndef _AMDILEVERGREENDEVICE_H_
+#define _AMDILEVERGREENDEVICE_H_
+#include "AMDILDevice.h"
+#include "AMDILSubtarget.h"
+
+namespace llvm {
+  class AMDILSubtarget;
+//===----------------------------------------------------------------------===//
+// Evergreen generation of devices and their respective sub classes
+//===----------------------------------------------------------------------===//
+
+
+// The AMDILEvergreenDevice is the base device class for all of the Evergreen
+// series of cards. This class contains information required to differentiate
+// the Evergreen device from the generic AMDILDevice. This device represents
+// that capabilities of the 'Juniper' cards, also known as the HD57XX.
+class AMDILEvergreenDevice : public AMDILDevice {
+public:
+  AMDILEvergreenDevice(AMDILSubtarget *ST);
+  virtual ~AMDILEvergreenDevice();
+  virtual size_t getMaxLDSSize() const;
+  virtual size_t getMaxGDSSize() const;
+  virtual size_t getWavefrontSize() const;
+  virtual uint32_t getGeneration() const;
+  virtual uint32_t getMaxNumUAVs() const;
+  virtual uint32_t getResourceID(uint32_t) const;
+  virtual FunctionPass*
+    getIOExpansion(TargetMachine& AMDIL_OPT_LEVEL_DECL) const;
+  virtual AsmPrinter*
+    getAsmPrinter(TargetMachine& TM, MCStreamer &Streamer) const;
+  virtual FunctionPass*
+    getPointerManager(TargetMachine& AMDIL_OPT_LEVEL_DECL) const;
+protected:
+  virtual void setCaps();
+}; // AMDILEvergreenDevice
+
+// The AMDILCypressDevice is similiar to the AMDILEvergreenDevice, except it has
+// support for double precision operations. This device is used to represent
+// both the Cypress and Hemlock cards, which are commercially known as HD58XX
+// and HD59XX cards.
+class AMDILCypressDevice : public AMDILEvergreenDevice {
+public:
+  AMDILCypressDevice(AMDILSubtarget *ST);
+  virtual ~AMDILCypressDevice();
+private:
+  virtual void setCaps();
+}; // AMDILCypressDevice
+
+
+// The AMDILCedarDevice is the class that represents all of the 'Cedar' based
+// devices. This class differs from the base AMDILEvergreenDevice in that the
+// device is a ~quarter of the 'Juniper'. These are commercially known as the
+// HD54XX and HD53XX series of cards.
+class AMDILCedarDevice : public AMDILEvergreenDevice {
+public:
+  AMDILCedarDevice(AMDILSubtarget *ST);
+  virtual ~AMDILCedarDevice();
+  virtual size_t getWavefrontSize() const;
+private:
+  virtual void setCaps();
+}; // AMDILCedarDevice
+
+// The AMDILRedwoodDevice is the class the represents all of the 'Redwood' based
+// devices. This class differs from the base class, in that these devices are
+// considered about half of a 'Juniper' device. These are commercially known as
+// the HD55XX and HD56XX series of cards.
+class AMDILRedwoodDevice : public AMDILEvergreenDevice {
+public:
+  AMDILRedwoodDevice(AMDILSubtarget *ST);
+  virtual ~AMDILRedwoodDevice();
+  virtual size_t getWavefrontSize() const;
+private:
+  virtual void setCaps();
+}; // AMDILRedwoodDevice
+  
+} // namespace llvm
+#endif // _AMDILEVERGREENDEVICE_H_
--- a/src/gallium/drivers/radeon/AMDILFormats.td
+++ b/src/gallium/drivers/radeon/AMDILFormats.td
@ -0,0 +1,450 @@
+//==- AMDILFormats.td - AMDIL Instruction Formats ----*- tablegen -*-==//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+//===--------------------------------------------------------------------===//
+include "AMDILTokenDesc.td"
+
+//===--------------------------------------------------------------------===//
+// The parent IL instruction class that inherits the Instruction class. This
+// class sets the corresponding namespace, the out and input dag lists the
+// pattern to match to and the string to print out for the assembly printer.
+//===--------------------------------------------------------------------===//
+class ILFormat<ILOpCode op, dag outs, dag ins, string asmstr, list<dag> pattern>
+: Instruction {
+
+     let Namespace = "AMDIL";
+     dag OutOperandList = outs;
+     dag InOperandList = ins;
+     ILOpCode operation = op;
+     let Pattern = pattern;
+     let AsmString = !strconcat(asmstr, "\n");
+     let isPseudo = 1;
+     bit hasIEEEFlag = 0;
+     bit hasZeroOpFlag = 0;
+}
+
+//===--------------------------------------------------------------------===//
+// The base class for vector insert instructions. It is a single dest, quad
+// source instruction where the last two source operands must be 32bit
+// immediate values that are encoding the swizzle of the source register
+// The src2 and src3 instructions must also be inversion of each other such
+// that if src2 is 0x1000300(x0z0), src3 must be 0x20004(0y0w). The values
+// are encoded as 32bit integer with each 8 char representing a swizzle value.
+// The encoding is as follows for 32bit register types:
+// 0x00 -> '_'
+// 0x01 -> 'x'
+// 0x02 -> 'y'
+// 0x03 -> 'z'
+// 0x04 -> 'w'
+// 0x05 -> 'x'
+// 0x06 -> 'y'
+// 0x07 -> 'z'
+// 0x08 -> 'w'
+// 0x09 -> '0'
+// The encoding is as follows for 64bit register types:
+// 0x00 -> "__"
+// 0x01 -> "xy"
+// 0x02 -> "zw"
+// 0x03 -> "xy"
+// 0x04 -> "zw"
+// 0x05 -> "00"
+//===--------------------------------------------------------------------===//
+class InsertVectorClass<ILOpCode op, RegisterClass DReg, RegisterClass SReg,
+      SDNode OpNode, string asmstr> :
+      ILFormat<op, (outs DReg:$dst),
+      (ins DReg:$src0, SReg:$src1, i32imm:$src2, i32imm:$src3),
+      !strconcat(asmstr, " $dst, $src0, $src1"),
+      [(set DReg:$dst, (OpNode DReg:$src0, SReg:$src1,
+                     timm:$src2, timm:$src3))]>;
+
+//===--------------------------------------------------------------------===//
+// Class that has one input parameters and one output parameter.
+// The basic pattern for this class is "Opcode Dst, Src0" and
+// handles the unary math operators.
+// It sets the binary token ILSrc, ILSrcMod, ILRelAddr and ILSrc and ILSrcMod
+// if the addressing is register relative for input and output register 0.
+//===--------------------------------------------------------------------===//
+class OneInOneOut<ILOpCode op, dag outs, dag ins,
+      string asmstr, list<dag> pattern>
+      : ILFormat<op, outs, ins, asmstr, pattern>
+{
+     ILDst       dst_reg;
+     ILDstMod    dst_mod;
+     ILRelAddr   dst_rel;
+     ILSrc       dst_reg_rel;
+     ILSrcMod    dst_reg_rel_mod;
+     ILSrc       src0_reg;
+     ILSrcMod    src0_mod;
+     ILRelAddr   src0_rel;
+     ILSrc       src0_reg_rel;
+     ILSrcMod    src0_reg_rel_mod;
+}
+
+//===--------------------------------------------------------------------===//
+// A simplified version of OneInOneOut class where the pattern is standard
+// and does not need special cases. This requires that the pattern has
+// a SDNode and takes a source and destination register that is of type
+// RegisterClass. This is the standard unary op class.
+//===--------------------------------------------------------------------===//
+class UnaryOp<ILOpCode op, SDNode OpNode,
+      RegisterClass dRegs, RegisterClass sRegs>
+      : OneInOneOut<op, (outs dRegs:$dst), (ins sRegs:$src),
+      !strconcat(op.Text, " $dst, $src"),
+      [(set dRegs:$dst, (OpNode sRegs:$src))]>;
+
+//===--------------------------------------------------------------------===//
+// This class is similiar to the UnaryOp class, however, there is no
+// result value to assign.
+//===--------------------------------------------------------------------===//
+class UnaryOpNoRet<ILOpCode op, dag outs, dag ins,
+      string asmstr, list<dag> pattern>
+      : ILFormat<op, outs, ins, asmstr, pattern>
+{
+     ILSrc       src0_reg;
+     ILSrcMod    src0_mod;
+     ILRelAddr   src0_rel;
+     ILSrc       src0_reg_rel;
+     ILSrcMod    src0_reg_rel_mod;
+}
+
+//===--------------------------------------------------------------------===//
+// Set of classes that have two input parameters and one output parameter.
+// The basic pattern for this class is "Opcode Dst, Src0, Src1" and
+// handles the binary math operators and comparison operations.
+// It sets the binary token ILSrc, ILSrcMod, ILRelAddr and ILSrc and ILSrcMod
+// if the addressing is register relative for input register 1.
+//===--------------------------------------------------------------------===//
+class TwoInOneOut<ILOpCode op, dag outs, dag ins,
+      string asmstr, list<dag> pattern>
+      : OneInOneOut<op, outs, ins, asmstr, pattern>
+{
+     ILSrc       src1_reg;
+     ILSrcMod    src1_mod;
+     ILRelAddr   src1_rel;
+     ILSrc       src1_reg_rel;
+     ILSrcMod    src1_reg_rel_mod;
+}
+//===--------------------------------------------------------------------===//
+// A simplification of the TwoInOneOut pattern for Binary Operations.
+// This class is a helper class that assumes the simple pattern of
+// $dst = op $src0 $src1.
+// Other type of matching patterns need to use the TwoInOneOut class.
+//===--------------------------------------------------------------------===//
+class BinaryOp<ILOpCode op, SDNode OpNode, RegisterClass dReg,
+      RegisterClass sReg0, RegisterClass sReg1>
+      : TwoInOneOut<op, (outs dReg:$dst), (ins sReg0:$src0, sReg1:$src1),
+      !strconcat(op.Text, " $dst, $src0, $src1"),
+      [(set dReg:$dst, (OpNode sReg0:$src0, sReg1:$src1))]>;
+
+//===--------------------------------------------------------------------===//
+// The base class for vector extract instructions. The vector extract
+// instructions take as an input value a source register and a 32bit integer
+// with the same encoding as specified in InsertVectorClass and produces
+// a result with only the swizzled component in the destination register.
+//===--------------------------------------------------------------------===//
+class ExtractVectorClass<RegisterClass DReg, RegisterClass SReg, SDNode OpNode>
+: TwoInOneOut<IL_OP_MOV, (outs DReg:$dst), (ins SReg:$src0, i32imm:$src1),
+     "mov $dst, $src0",
+     [(set DReg:$dst, (OpNode SReg:$src0, timm:$src1))]>;
+
+//===--------------------------------------------------------------------===//
+// The base class for vector concatenation. This class creates either a vec2
+// or a vec4 of 32bit data types or a vec2 of 64bit data types. This is done
+// by swizzling either the 'x' or 'xy' components of the source operands
+// into the destination register.
+//===--------------------------------------------------------------------===//
+class VectorConcatClass<RegisterClass Dst, RegisterClass Src, SDNode OpNode>
+      : TwoInOneOut<IL_OP_I_ADD, (outs Dst:$dst), (ins Src:$src0, Src:$src1),
+      "iadd $dst, $src0, $src1",
+      [(set Dst:$dst, (OpNode Src:$src0, Src:$src1))]>;
+
+//===--------------------------------------------------------------------===//
+// Similiar to the UnaryOpNoRet class, but takes as arguments two input
+// operands. Used mainly for barrier instructions on PC platform.
+//===--------------------------------------------------------------------===//
+class BinaryOpNoRet<ILOpCode op, dag outs, dag ins,
+      string asmstr, list<dag> pattern>
+      : UnaryOpNoRet<op, outs, ins, asmstr, pattern>
+{
+     ILSrc       src1_reg;
+     ILSrcMod    src1_mod;
+     ILRelAddr   src1_rel;
+     ILSrc       src1_reg_rel;
+     ILSrcMod    src1_reg_rel_mod;
+}
+
+//===--------------------------------------------------------------------===//
+// Set of classes that have three input parameters and one output parameter.
+// The basic pattern for this class is "Opcode Dst, Src0, Src1, Src2" and
+// handles the mad and conditional mov instruction.
+// It sets the binary token ILSrc, ILSrcMod, ILRelAddr and ILSrc and ILSrcMod
+// if the addressing is register relative.
+// This class is the parent class of TernaryOp
+//===--------------------------------------------------------------------===//
+class ThreeInOneOut<ILOpCode op, dag outs, dag ins,
+      string asmstr, list<dag> pattern>
+      : TwoInOneOut<op, outs, ins, asmstr, pattern> {
+           ILSrc       src2_reg;
+           ILSrcMod    src2_mod;
+           ILRelAddr   src2_rel;
+           ILSrc       src2_reg_rel;
+           ILSrcMod    src2_reg_rel_mod;
+      }
+
+//===--------------------------------------------------------------------===//
+// The g version of the Three Input pattern uses a standard pattern but
+// but allows specification of the register to further generalize the class
+// This class is mainly used in the generic multiclasses in AMDILMultiClass.td
+//===--------------------------------------------------------------------===//
+class TernaryOp<ILOpCode op, SDNode OpNode,
+      RegisterClass dReg,
+      RegisterClass sReg0,
+      RegisterClass sReg1,
+      RegisterClass sReg2>
+      : ThreeInOneOut<op, (outs dReg:$dst),
+      (ins sReg0:$src0, sReg1:$src1, sReg2:$src2),
+      !strconcat(op.Text, " $dst, $src0, $src1, $src2"),
+      [(set dReg:$dst,
+                (OpNode sReg0:$src0, sReg1:$src1, sReg2:$src2))]>;
+
+//===--------------------------------------------------------------------===//
+// Set of classes that have three input parameters and one output parameter.
+// The basic pattern for this class is "Opcode Dst, Src0, Src1, Src2" and
+// handles the mad and conditional mov instruction.
+// It sets the binary token ILSrc, ILSrcMod, ILRelAddr and ILSrc and ILSrcMod
+// if the addressing is register relative.
+// This class is the parent class of TernaryOp
+//===--------------------------------------------------------------------===//
+class FourInOneOut<ILOpCode op, dag outs, dag ins,
+      string asmstr, list<dag> pattern>
+      : ThreeInOneOut<op, outs, ins, asmstr, pattern> {
+           ILSrc       src3_reg;
+           ILSrcMod    src3_mod;
+           ILRelAddr   src3_rel;
+           ILSrc       src3_reg_rel;
+           ILSrcMod    src3_reg_rel_mod;
+      }
+
+
+//===--------------------------------------------------------------------===//
+// The macro class that is an extension of OneInOneOut but is tailored for
+// macros only where all the register types are the same
+//===--------------------------------------------------------------------===//
+class UnaryMacro<RegisterClass Dst, RegisterClass Src0, SDNode OpNode>
+: OneInOneOut<IL_OP_MACRO, (outs Dst:$dst),
+     (ins Src0:$src0),
+     "($dst),($src0)",
+     [(set Dst:$dst, (OpNode Src0:$src0))]>;
+
+//===--------------------------------------------------------------------===//
+// The macro class is an extension of TwoInOneOut but is tailored for
+// macros only where all the register types are the same
+//===--------------------------------------------------------------------===//
+class BinaryMacro<RegisterClass Dst,
+      RegisterClass Src0,
+      RegisterClass Src1,
+      SDNode OpNode>
+      : TwoInOneOut<IL_OP_MACRO, (outs Dst:$dst),
+      (ins Src0: $src0, Src1:$src1),
+      "($dst),($src0, $src1)",
+      [(set Dst:$dst, (OpNode Src0:$src0, Src1:$src1))]>;
+
+//===--------------------------------------------------------------------===//
+// Classes for dealing with atomic instructions w/ 32bit pointers
+//===--------------------------------------------------------------------===//
+class Append<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst")),
+      [(set GPRI32:$dst, (intr ADDR:$id))]>;
+
+
+// TODO: Need to get this working without dst...
+class AppendNoRet<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst")),
+      [(set GPRI32:$dst, (intr ADDR:$id))]>;
+
+class UniAtom<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$ptr, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, timm:$id))]>;
+
+
+// TODO: Need to get this working without dst...
+class UniAtomNoRet<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst), (ins MEMI32:$ptr, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, timm:$id))]>;
+
+class BinAtom<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$ptr, GPRI32:$src, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr, $src")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, GPRI32:$src, timm:$id))]>;
+
+
+// TODO: Need to get this working without dst...
+class BinAtomNoRet<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst), (ins MEMI32:$ptr, GPRI32:$src, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr, $src")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, GPRI32:$src, timm:$id))]>;
+
+class TriAtom<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr, $src, $src1")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+class CmpXChg<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr, $src1, $src")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+// TODO: Need to get this working without dst...
+class TriAtomNoRet<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr, $src, $src1")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+// TODO: Need to get this working without dst...
+class CmpXChgNoRet<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI32:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr, $src1, $src")),
+      [(set GPRI32:$dst, (intr ADDR:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+
+//===--------------------------------------------------------------------===//
+// Classes for dealing with atomic instructions w/ 64bit pointers
+//===--------------------------------------------------------------------===//
+class Append64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst")),
+      [(set GPRI32:$dst, (intr ADDR64:$id))]>;
+
+
+// TODO: Need to get this working without dst...
+class AppendNoRet64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst")),
+      [(set GPRI32:$dst, (intr ADDR64:$id))]>;
+
+class UniAtom64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$ptr, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, timm:$id))]>;
+
+
+// TODO: Need to get this working without dst...
+class UniAtomNoRet64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst), (ins MEMI64:$ptr, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, timm:$id))]>;
+
+class BinAtom64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$ptr, GPRI32:$src, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr, $src")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, GPRI32:$src, timm:$id))]>;
+
+
+// TODO: Need to get this working without dst...
+class BinAtomNoRet64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst), (ins MEMI64:$ptr, GPRI32:$src, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr, $src")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, GPRI32:$src, timm:$id))]>;
+
+class TriAtom64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr, $src, $src1")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+class CmpXChg64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $dst, $ptr, $src1, $src")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+// TODO: Need to get this working without dst...
+class TriAtomNoRet64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr, $src, $src1")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+// TODO: Need to get this working without dst...
+class CmpXChgNoRet64<ILOpCode op, string idType, SDNode intr>
+      : ILFormat<op, (outs GPRI32:$dst),
+      (ins MEMI64:$ptr, GPRI32:$src, GPRI32:$src1, i32imm:$id),
+      !strconcat(op.Text, !strconcat(idType," $ptr, $src1, $src")),
+      [(set GPRI32:$dst, (intr ADDR64:$ptr, GPRI32:$src, GPRI32:$src1, timm:$id))]>;
+
+//===--------------------------------------------------------------------===//
+// Intrinsic classes
+// Generic versions of the above classes but for Target specific intrinsics
+// instead of SDNode patterns.
+//===--------------------------------------------------------------------===//
+let TargetPrefix = "AMDIL", isTarget = 1 in {
+     class VoidIntLong :
+          Intrinsic<[llvm_i64_ty], [], []>;
+     class VoidIntInt :
+          Intrinsic<[llvm_i32_ty], [], []>;
+     class VoidIntBool :
+          Intrinsic<[llvm_i32_ty], [], []>;
+     class UnaryIntInt :
+          Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>], []>;
+     class UnaryIntFloat :
+          Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>], []>;
+     class ConvertIntFTOI :
+          Intrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty], []>;
+     class ConvertIntITOF :
+          Intrinsic<[llvm_anyfloat_ty], [llvm_anyint_ty], []>;
+     class UnaryIntNoRetInt :
+          Intrinsic<[], [llvm_anyint_ty], []>;
+     class UnaryIntNoRetFloat :
+          Intrinsic<[], [llvm_anyfloat_ty], []>;
+     class BinaryIntInt :
+          Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>, LLVMMatchType<0>], []>;
+     class BinaryIntFloat :
+          Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>, LLVMMatchType<0>], []>;
+     class BinaryIntNoRetInt :
+          Intrinsic<[], [llvm_anyint_ty, LLVMMatchType<0>], []>;
+     class BinaryIntNoRetFloat :
+          Intrinsic<[], [llvm_anyfloat_ty, LLVMMatchType<0>], []>;
+     class TernaryIntInt :
+          Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>,
+          LLVMMatchType<0>, LLVMMatchType<0>], []>;
+     class TernaryIntFloat :
+          Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,
+          LLVMMatchType<0>, LLVMMatchType<0>], []>;
+     class QuaternaryIntInt :
+          Intrinsic<[llvm_anyint_ty], [LLVMMatchType<0>,
+          LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>], []>;
+     class UnaryAtomicInt :
+          Intrinsic<[llvm_i32_ty], [llvm_ptr_ty, llvm_i32_ty], [IntrReadWriteArgMem]>;
+     class BinaryAtomicInt :
+          Intrinsic<[llvm_i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty], [IntrReadWriteArgMem]>;
+     class TernaryAtomicInt :
+          Intrinsic<[llvm_i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty]>;
+     class UnaryAtomicIntNoRet :
+          Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty], [IntrReadWriteArgMem]>;
+     class BinaryAtomicIntNoRet :
+          Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty], [IntrReadWriteArgMem]>;
+     class TernaryAtomicIntNoRet :
+          Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty], [IntrReadWriteArgMem]>;
+}
--- a/src/gallium/drivers/radeon/AMDILFrameLowering.cpp
+++ b/src/gallium/drivers/radeon/AMDILFrameLowering.cpp
@ -0,0 +1,53 @@
+//===----------------------- AMDILFrameLowering.cpp -----------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// Interface to describe a layout of a stack frame on a AMDIL target machine
+//
+//===----------------------------------------------------------------------===//
+#include "AMDILFrameLowering.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+
+using namespace llvm;
+AMDILFrameLowering::AMDILFrameLowering(StackDirection D, unsigned StackAl,
+    int LAO, unsigned TransAl)
+  : TargetFrameLowering(D, StackAl, LAO, TransAl)
+{
+}
+
+AMDILFrameLowering::~AMDILFrameLowering()
+{
+}
+
+/// getFrameIndexOffset - Returns the displacement from the frame register to
+/// the stack frame of the specified index.
+int AMDILFrameLowering::getFrameIndexOffset(const MachineFunction &MF,
+                                         int FI) const {
+  const MachineFrameInfo *MFI = MF.getFrameInfo();
+  return MFI->getObjectOffset(FI);
+}
+
+const TargetFrameLowering::SpillSlot *
+AMDILFrameLowering::getCalleeSavedSpillSlots(unsigned &NumEntries) const
+{
+  NumEntries = 0;
+  return 0;
+}
+void
+AMDILFrameLowering::emitPrologue(MachineFunction &MF) const
+{
+}
+void
+AMDILFrameLowering::emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const
+{
+}
+bool
+AMDILFrameLowering::hasFP(const MachineFunction &MF) const
+{
+  return false;
+}
--- a/src/gallium/drivers/radeon/AMDILFrameLowering.h
+++ b/src/gallium/drivers/radeon/AMDILFrameLowering.h
@ -0,0 +1,46 @@
+//===--------------------- AMDILFrameLowering.h -----------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// Interface to describe a layout of a stack frame on a AMDIL target machine
+//
+//===----------------------------------------------------------------------===//
+#ifndef _AMDILFRAME_LOWERING_H_
+#define _AMDILFRAME_LOWERING_H_
+
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/Target/TargetFrameLowering.h"
+
+/// Information about the stack frame layout on the AMDIL targets. It holds
+/// the direction of the stack growth, the known stack alignment on entry to
+/// each function, and the offset to the locals area.
+/// See TargetFrameInfo for more comments.
+
+namespace llvm {
+  class AMDILFrameLowering : public TargetFrameLowering {
+    public:
+      AMDILFrameLowering(StackDirection D, unsigned StackAl, int LAO, unsigned
+          TransAl = 1);
+      virtual ~AMDILFrameLowering();
+      virtual int getFrameIndexOffset(const MachineFunction &MF,
+                                         int FI) const;
+      virtual const SpillSlot *
+        getCalleeSavedSpillSlots(unsigned &NumEntries) const;
+      virtual void emitPrologue(MachineFunction &MF) const;
+      virtual void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const;
+      virtual bool hasFP(const MachineFunction &MF) const;
+  }; // class AMDILFrameLowering
+} // namespace llvm
+#endif // _AMDILFRAME_LOWERING_H_
--- a/src/gallium/drivers/radeon/AMDILGlobalManager.cpp
+++ b/src/gallium/drivers/radeon/AMDILGlobalManager.cpp
--- a/src/gallium/drivers/radeon/AMDILGlobalManager.h
+++ b/src/gallium/drivers/radeon/AMDILGlobalManager.h
@ -0,0 +1,256 @@
+//===-- AMDILGlobalManager.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+// ==-----------------------------------------------------------------------===//
+//
+// Class that handles parsing and storing global variables that are relevant to
+// the compilation of the module.
+//
+// ==-----------------------------------------------------------------------===//
+
+#ifndef _AMDILGLOBALMANAGER_H_
+#define _AMDILGLOBALMANAGER_H_
+
+#include "AMDIL.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/Module.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include <set>
+#include <string>
+
+#define CB_BASE_OFFSET 2
+
+namespace llvm {
+
+class PointerType;
+class AMDILKernelManager;
+class AMDILSubtarget;
+class TypeSymbolTable;
+class Argument;
+class GlobalValue;
+class MachineFunction;
+
+/// structure that holds information for a single local/region address array
+typedef struct _arrayMemRec {
+  uint32_t vecSize; // size of each vector
+  uint32_t offset;  // offset into the memory section
+  bool isHW;        // flag to specify if HW is used or SW is used
+  bool isRegion;    // flag to specify if GDS is used or not
+} arraymem;
+ 
+/// Structure that holds information for all local/region address
+/// arrays in the kernel
+typedef struct _localArgRec {
+  llvm::SmallVector<arraymem *, DEFAULT_VEC_SLOTS> local;
+  std::string name; // Kernel Name
+} localArg;
+
+/// structure that holds information about a constant address
+/// space pointer that is a kernel argument
+typedef struct _constPtrRec {
+  const Value *base;
+  uint32_t size;
+  uint32_t offset;
+  uint32_t cbNum; // value of 0 means that it does not use hw CB
+  bool isArray;
+  bool isArgument;
+  bool usesHardware;
+  std::string name;
+} constPtr;
+
+/// Structure that holds information for each kernel argument
+typedef struct _kernelArgRec {
+  uint32_t reqGroupSize[3];
+  uint32_t reqRegionSize[3];
+  llvm::SmallVector<uint32_t, DEFAULT_VEC_SLOTS> argInfo;
+  bool mHasRWG;
+  bool mHasRWR;
+} kernelArg;
+
+/// Structure that holds information for each kernel
+typedef struct _kernelRec {
+  mutable uint32_t curSize;
+  mutable uint32_t curRSize;
+  mutable uint32_t curHWSize;
+  mutable uint32_t curHWRSize;
+  uint32_t constSize;
+  kernelArg *sgv;
+  localArg *lvgv;
+  llvm::SmallVector<struct _constPtrRec, DEFAULT_VEC_SLOTS> constPtr;
+  uint32_t constSizes[HW_MAX_NUM_CB];
+  llvm::SmallSet<uint32_t, OPENCL_MAX_READ_IMAGES> readOnly;
+  llvm::SmallSet<uint32_t, OPENCL_MAX_WRITE_IMAGES> writeOnly;
+  llvm::SmallVector<std::pair<uint32_t, const Constant *>,
+    DEFAULT_VEC_SLOTS> CPOffsets;
+} kernel;
+
+class AMDILGlobalManager {
+public:
+  AMDILGlobalManager(bool debugMode = false);
+  ~AMDILGlobalManager();
+
+  /// Process the given module and parse out the global variable metadata passed
+  /// down from the frontend-compiler
+  void processModule(const Module &MF, const AMDILTargetMachine* mTM);
+
+  /// Returns whether the current name is the name of a kernel function or a
+  /// normal function
+  bool isKernel(const llvm::StringRef &name) const;
+
+  /// Returns true if the image ID corresponds to a read only image.
+  bool isReadOnlyImage(const llvm::StringRef &name, uint32_t iID) const;
+
+  /// Returns true if the image ID corresponds to a write only image.
+  bool isWriteOnlyImage(const llvm::StringRef &name, uint32_t iID) const;
+
+  /// Returns the number of write only images for the kernel.
+  uint32_t getNumWriteImages(const llvm::StringRef &name) const;
+
+  /// Gets the group size of the kernel for the given dimension.
+  uint32_t getLocal(const llvm::StringRef &name, uint32_t dim) const;
+
+  /// Gets the region size of the kernel for the given dimension.
+  uint32_t getRegion(const llvm::StringRef &name, uint32_t dim) const;
+
+  /// Get the Region memory size in 1d for the given function/kernel.
+  uint32_t getRegionSize(const llvm::StringRef &name) const;
+
+  /// Get the region memory size in 1d for the given function/kernel.
+  uint32_t getLocalSize(const llvm::StringRef &name) const;
+
+  // Get the max group size in one 1D for the given function/kernel.
+  uint32_t getMaxGroupSize(const llvm::StringRef &name) const;
+
+  // Get the max region size in one 1D for the given function/kernel.
+  uint32_t getMaxRegionSize(const llvm::StringRef &name) const;
+
+  /// Get the constant memory size in 1d for the given function/kernel.
+  uint32_t getConstSize(const llvm::StringRef &name) const;
+
+  /// Get the HW local size in 1d for the given function/kernel We need to
+  /// seperate SW local and HW local for the case where some local memory is
+  /// emulated in global and some is using the hardware features. The main
+  /// problem is that in OpenCL 1.0/1.1 cl_khr_byte_addressable_store allows
+  /// these actions to happen on all memory spaces, but the hardware can only
+  /// write byte address stores to UAV and LDS, not GDS or Stack.
+  uint32_t getHWLocalSize(const llvm::StringRef &name) const;
+  uint32_t getHWRegionSize(const llvm::StringRef &name) const;
+
+  /// Get the offset of the array for the kernel.
+  int32_t getArrayOffset(const llvm::StringRef &name) const;
+
+  /// Get the offset of the const memory for the kernel.
+  int32_t getConstOffset(const llvm::StringRef &name) const;
+
+  /// Get the boolean value if this particular constant uses HW or not.
+  bool getConstHWBit(const llvm::StringRef &name) const;
+
+  /// Get a reference to the kernel metadata information for the given function
+  /// name.
+  const kernel &getKernel(const llvm::StringRef &name) const;
+
+  /// Returns whether a reqd_workgroup_size attribute has been used or not.
+  bool hasRWG(const llvm::StringRef &name) const;
+
+  /// Returns whether a reqd_workregion_size attribute has been used or not.
+  bool hasRWR(const llvm::StringRef &name) const;
+
+
+  /// Dump the data section to the output stream for the given kernel.
+  void dumpDataSection(llvm::raw_ostream &O, AMDILKernelManager *km);
+
+  /// Iterate through the constants that are global to the compilation unit.
+  StringMap<constPtr>::iterator consts_begin();
+  StringMap<constPtr>::iterator consts_end();
+
+  /// Query if the kernel has a byte store.
+  bool byteStoreExists(llvm::StringRef S) const;
+
+  /// Query if the kernel and argument uses hardware constant memory.
+  bool usesHWConstant(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Query if the constant pointer is an argument.
+  bool isConstPtrArgument(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Query if the constant pointer is an array that is globally scoped.
+  bool isConstPtrArray(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Query the size of the constant pointer.
+  uint32_t getConstPtrSize(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Query the offset of the constant pointer.
+  uint32_t getConstPtrOff(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Query the constant buffer number for a constant pointer.
+  uint32_t getConstPtrCB(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Query the Value* that the constant pointer originates from.
+  const Value *getConstPtrValue(const kernel &krnl, const llvm::StringRef &arg);
+
+  /// Get the ID of the argument.
+  int32_t getArgID(const Argument *arg);
+
+  /// Get the unique function ID for the specific function name and create a new
+  /// unique ID if it is not found.
+  uint32_t getOrCreateFunctionID(const GlobalValue* func);
+  uint32_t getOrCreateFunctionID(const std::string& func);
+
+  /// Calculate the offsets of the constant pool for the given kernel and
+  /// machine function.
+  void calculateCPOffsets(const MachineFunction *MF, kernel &krnl);
+
+  /// Print the global manager to the output stream.
+  void print(llvm::raw_ostream& O);
+
+  /// Dump the global manager to the output stream - debug use.
+  void dump();
+
+private:
+  /// Various functions that parse global value information and store them in
+  /// the global manager. This approach is used instead of dynamic parsing as it
+  /// might require more space, but should allow caching of data that gets
+  /// requested multiple times.
+  kernelArg parseSGV(const GlobalValue *GV);
+  localArg  parseLVGV(const GlobalValue *GV);
+  void parseGlobalAnnotate(const GlobalValue *G);
+  void parseImageAnnotate(const GlobalValue *G);
+  void parseConstantPtrAnnotate(const GlobalValue *G);
+  void printConstantValue(const Constant *CAval,
+                          llvm::raw_ostream& O,
+                          bool asByte);
+  void parseKernelInformation(const Value *V);
+  void parseAutoArray(const GlobalValue *G, bool isRegion);
+  void parseConstantPtr(const GlobalValue *G);
+  void allocateGlobalCB();
+  void dumpDataToCB(llvm::raw_ostream &O, AMDILKernelManager *km, uint32_t id);
+  bool checkConstPtrsUseHW(Module::const_iterator *F);
+
+  llvm::StringMap<arraymem> mArrayMems;
+  llvm::StringMap<localArg> mLocalArgs;
+  llvm::StringMap<kernelArg> mKernelArgs;
+  llvm::StringMap<kernel> mKernels;
+  llvm::StringMap<constPtr> mConstMems;
+  llvm::StringMap<uint32_t> mFuncNames;
+  llvm::DenseMap<const GlobalValue*, uint32_t> mFuncPtrNames;
+  llvm::DenseMap<uint32_t, llvm::StringRef> mImageNameMap;
+  std::set<llvm::StringRef> mByteStore;
+  std::set<llvm::StringRef> mIgnoreStr;
+  llvm::DenseMap<const Argument *, int32_t> mArgIDMap;
+  const char *symTab;
+  const AMDILSubtarget *mSTM;
+  size_t mOffset;
+  uint32_t mReservedBuffs;
+  uint32_t mCurrentCPOffset;
+  bool mDebugMode;
+};
+} // namespace llvm
+#endif // __AMDILGLOBALMANAGER_H_
--- a/src/gallium/drivers/radeon/AMDILIOExpansion.cpp
+++ b/src/gallium/drivers/radeon/AMDILIOExpansion.cpp
--- a/src/gallium/drivers/radeon/AMDILIOExpansion.h
+++ b/src/gallium/drivers/radeon/AMDILIOExpansion.h
@ -0,0 +1,320 @@
+//===----------- AMDILIOExpansion.h - IO Expansion Pass -------------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// The AMDIL IO Expansion class expands pseudo IO instructions into a sequence
+// of instructions that produces the correct results. These instructions are
+// not expanded earlier in the backend because any pass before this can assume to
+// be able to generate a load/store instruction. So this pass can only have
+// passes that execute after it if no load/store instructions can be generated
+// in those passes.
+//===----------------------------------------------------------------------===//
+#ifndef _AMDILIOEXPANSION_H_
+#define _AMDILIOEXPANSION_H_
+#undef DEBUG_TYPE
+#undef DEBUGME
+#define DEBUG_TYPE "IOExpansion"
+#if !defined(NDEBUG)
+#define DEBUGME (DebugFlag && isCurrentDebugType(DEBUG_TYPE))
+#else
+#define DEBUGME (false)
+#endif
+#include "AMDIL.h"
+#include "llvm/CodeGen/MachineFunctionAnalysis.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Support/Compiler.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Target/TargetMachine.h"
+
+namespace llvm {
+  class MachineFunction;
+  class AMDILKernelManager;
+  class AMDILMachineFunctionInfo;
+  class AMDILSubtarget;
+  class MachineInstr;
+  class Constant;
+  class TargetInstrInfo;
+  class Type;
+  typedef enum {
+    NO_PACKING = 0,
+    PACK_V2I8,
+    PACK_V4I8,
+    PACK_V2I16,
+    PACK_V4I16,
+    UNPACK_V2I8,
+    UNPACK_V4I8,
+    UNPACK_V2I16,
+    UNPACK_V4I16,
+    UNPACK_LAST
+  } REG_PACKED_TYPE;
+  class AMDILIOExpansion : public MachineFunctionPass
+  {
+    public:
+      virtual ~AMDILIOExpansion();
+      virtual const char* getPassName() const;
+      bool runOnMachineFunction(MachineFunction &MF);
+      static char ID;
+    protected:
+      AMDILIOExpansion(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+      TargetMachine &TM;
+      //
+      // @param MI Machine instruction to check.
+      // @brief checks to see if the machine instruction
+      // is an I/O instruction or not.
+      //
+      // @return true if I/O, false otherwise.
+      //
+      virtual bool
+        isIOInstruction(MachineInstr *MI);
+      // Wrapper function that calls the appropriate I/O 
+      // expansion function based on the instruction type.
+      virtual void
+        expandIOInstruction(MachineInstr *MI);
+       virtual void
+        expandGlobalStore(MachineInstr *MI) = 0;
+      virtual void
+        expandLocalStore(MachineInstr *MI) = 0;
+      virtual void
+        expandRegionStore(MachineInstr *MI) = 0;
+      virtual void
+        expandPrivateStore(MachineInstr *MI) = 0;
+      virtual void
+        expandGlobalLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandRegionLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandLocalLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandPrivateLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandConstantLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandConstantPoolLoad(MachineInstr *MI) = 0;
+      bool
+        isAddrCalcInstr(MachineInstr *MI);
+      bool
+        isExtendLoad(MachineInstr *MI);
+      bool
+        isHardwareRegion(MachineInstr *MI);
+      bool
+        isHardwareLocal(MachineInstr *MI);
+      bool
+        isPackedData(MachineInstr *MI);
+      bool
+        isStaticCPLoad(MachineInstr *MI);
+      bool
+        isNbitType(Type *MI, uint32_t nBits, bool isScalar = true);
+      bool
+        isHardwareInst(MachineInstr *MI);
+      uint32_t
+        getMemorySize(MachineInstr *MI);
+      REG_PACKED_TYPE
+        getPackedID(MachineInstr *MI);
+      uint32_t
+        getShiftSize(MachineInstr *MI);
+      uint32_t
+        getPointerID(MachineInstr *MI);
+      void
+        expandTruncData(MachineInstr *MI);
+      void
+        expandLoadStartCode(MachineInstr *MI);
+      virtual void
+        expandStoreSetupCode(MachineInstr *MI) = 0;
+      void
+        expandAddressCalc(MachineInstr *MI);
+      void
+        expandLongExtend(MachineInstr *MI, 
+            uint32_t numComponents, uint32_t size, bool signedShift);
+      void 
+        expandLongExtendSub32(MachineInstr *MI, 
+            unsigned SHLop, unsigned SHRop, unsigned USHRop, 
+            unsigned SHLimm, uint64_t SHRimm, unsigned USHRimm, 
+            unsigned LCRop, bool signedShift);
+      void
+        expandIntegerExtend(MachineInstr *MI, unsigned, unsigned, unsigned);
+      void
+        expandExtendLoad(MachineInstr *MI);
+      virtual void
+        expandPackedData(MachineInstr *MI) = 0;
+       void
+         emitCPInst(MachineInstr* MI, const Constant* C, 
+             AMDILKernelManager* KM, int swizzle, bool ExtFPLoad);
+
+      bool mDebug;
+      const AMDILSubtarget *mSTM;
+      AMDILKernelManager *mKM;
+      MachineBasicBlock *mBB;
+      AMDILMachineFunctionInfo *mMFI;
+      const TargetInstrInfo *mTII;
+      bool saveInst;
+    private:
+      void
+        emitStaticCPLoad(MachineInstr* MI, int swizzle, int id,
+            bool ExtFPLoad);
+  }; // class AMDILIOExpansion
+
+  // Intermediate class that holds I/O code expansion that is common to the
+  // 7XX, Evergreen and Northern Island family of chips.
+  class AMDIL789IOExpansion : public AMDILIOExpansion  {
+    public:
+      virtual ~AMDIL789IOExpansion();
+      virtual const char* getPassName() const;
+    protected:
+      AMDIL789IOExpansion(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+       virtual void
+        expandGlobalStore(MachineInstr *MI) = 0;
+      virtual void
+        expandLocalStore(MachineInstr *MI) = 0;
+      virtual void
+        expandRegionStore(MachineInstr *MI) = 0;
+      virtual void
+        expandGlobalLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandRegionLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandLocalLoad(MachineInstr *MI) = 0;
+      virtual void
+        expandPrivateStore(MachineInstr *MI);
+      virtual void
+        expandConstantLoad(MachineInstr *MI);
+      virtual void
+        expandPrivateLoad(MachineInstr *MI) ;
+      virtual void
+        expandConstantPoolLoad(MachineInstr *MI);
+      void
+        expandStoreSetupCode(MachineInstr *MI);
+      virtual void
+        expandPackedData(MachineInstr *MI);
+    private:
+      void emitVectorAddressCalc(MachineInstr *MI, bool is32bit, 
+          bool needsSelect);
+      void emitVectorSwitchWrite(MachineInstr *MI, bool is32bit);
+      void emitComponentExtract(MachineInstr *MI, unsigned flag, unsigned src, 
+          unsigned dst, bool beforeInst);
+      void emitDataLoadSelect(MachineInstr *MI);
+  }; // class AMDIL789IOExpansion
+  // Class that handles I/O emission for the 7XX family of devices.
+  class AMDIL7XXIOExpansion : public AMDIL789IOExpansion {
+    public:
+      AMDIL7XXIOExpansion(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+
+      ~AMDIL7XXIOExpansion();
+      const char* getPassName() const;
+    protected:
+      void
+        expandGlobalStore(MachineInstr *MI);
+      void
+        expandLocalStore(MachineInstr *MI);
+      void
+        expandRegionStore(MachineInstr *MI);
+      void
+        expandGlobalLoad(MachineInstr *MI);
+      void
+        expandRegionLoad(MachineInstr *MI);
+      void
+        expandLocalLoad(MachineInstr *MI);
+  }; // class AMDIL7XXIOExpansion
+
+  // Class that handles image functions to expand them into the
+  // correct set of I/O instructions.
+  class AMDILImageExpansion : public AMDIL789IOExpansion {
+    public:
+      AMDILImageExpansion(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+
+      virtual ~AMDILImageExpansion();
+    protected:
+      //
+      // @param MI Instruction iterator that has the sample instruction
+      // that needs to be taken care of.
+      // @brief transforms the __amdil_sample_data function call into a
+      // sample instruction in IL.
+      //
+      // @warning This function only works correctly if all functions get
+      // inlined
+      //
+      virtual void
+        expandImageLoad(MachineBasicBlock *BB, MachineInstr *MI);
+      //
+      // @param MI Instruction iterator that has the write instruction that
+      // needs to be taken care of.
+      // @brief transforms the __amdil_write_data function call into a
+      // simple UAV write instruction in IL.
+      //
+      // @warning This function only works correctly if all functions get
+      // inlined
+      //
+      virtual void
+        expandImageStore(MachineBasicBlock *BB, MachineInstr *MI);
+      //
+      // @param MI Instruction interator that has the image parameter
+      // instruction
+      // @brief transforms the __amdil_get_image_params function call into
+      // a copy of data from a specific constant buffer to the register
+      //
+      // @warning This function only works correctly if all functions get
+      // inlined
+      //
+      virtual void
+        expandImageParam(MachineBasicBlock *BB, MachineInstr *MI);
+
+      //
+      // @param MI Insturction that points to the image
+      // @brief transforms __amdil_sample_data into a sequence of
+      // if/else that selects the correct sample instruction.
+      //
+      // @warning This function is inefficient and works with no
+      // inlining.
+      //
+      virtual void
+        expandInefficientImageLoad(MachineBasicBlock *BB, MachineInstr *MI);
+    private:
+      AMDILImageExpansion(); // Do not implement.
+
+  }; // class AMDILImageExpansion
+
+  // Class that expands IO instructions for Evergreen and Northern
+  // Island family of devices.
+  class AMDILEGIOExpansion : public AMDILImageExpansion  {
+    public:
+      AMDILEGIOExpansion(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+
+      virtual ~AMDILEGIOExpansion();
+      const char* getPassName() const;
+    protected:
+      virtual bool
+        isIOInstruction(MachineInstr *MI);
+      virtual void
+        expandIOInstruction(MachineInstr *MI);
+      bool
+        isImageIO(MachineInstr *MI);
+      virtual void
+        expandGlobalStore(MachineInstr *MI);
+      void
+        expandLocalStore(MachineInstr *MI);
+      void
+        expandRegionStore(MachineInstr *MI);
+      virtual void
+        expandGlobalLoad(MachineInstr *MI);
+      void
+        expandRegionLoad(MachineInstr *MI);
+      void
+        expandLocalLoad(MachineInstr *MI);
+      virtual bool
+        isCacheableOp(MachineInstr *MI);
+      void
+        expandStoreSetupCode(MachineInstr *MI);
+      void
+        expandPackedData(MachineInstr *MI);
+    private:
+      bool
+        isArenaOp(MachineInstr *MI);
+      void
+        expandArenaSetup(MachineInstr *MI);
+  }; // class AMDILEGIOExpansion
+} // namespace llvm
+#endif // _AMDILIOEXPANSION_H_
--- a/src/gallium/drivers/radeon/AMDILISelDAGToDAG.cpp
+++ b/src/gallium/drivers/radeon/AMDILISelDAGToDAG.cpp
@ -0,0 +1,457 @@
+//===-- AMDILISelDAGToDAG.cpp - A dag to dag inst selector for AMDIL ------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file defines an instruction selector for the AMDIL target.
+//
+//===----------------------------------------------------------------------===//
+#include "AMDILDevices.h"
+#include "AMDILTargetMachine.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/CodeGen/PseudoSourceValue.h"
+#include "llvm/CodeGen/SelectionDAGISel.h"
+#include "llvm/Support/Compiler.h"
+
+using namespace llvm;
+
+//===----------------------------------------------------------------------===//
+// Instruction Selector Implementation
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// AMDILDAGToDAGISel - AMDIL specific code to select AMDIL machine instructions
+// //for SelectionDAG operations.
+//
+namespace {
+class AMDILDAGToDAGISel : public SelectionDAGISel {
+  // Subtarget - Keep a pointer to the AMDIL Subtarget around so that we can
+  // make the right decision when generating code for different targets.
+  const AMDILSubtarget &Subtarget;
+public:
+  AMDILDAGToDAGISel(AMDILTargetMachine &TM AMDIL_OPT_LEVEL_DECL);
+  virtual ~AMDILDAGToDAGISel();
+  inline SDValue getSmallIPtrImm(unsigned Imm);
+
+  SDNode *Select(SDNode *N);
+  // Complex pattern selectors
+  bool SelectADDRParam(SDValue Addr, SDValue& R1, SDValue& R2);
+  bool SelectADDR(SDValue N, SDValue &R1, SDValue &R2);
+  bool SelectADDR64(SDValue N, SDValue &R1, SDValue &R2);
+  static bool isGlobalStore(const StoreSDNode *N);
+  static bool isPrivateStore(const StoreSDNode *N);
+  static bool isLocalStore(const StoreSDNode *N);
+  static bool isRegionStore(const StoreSDNode *N);
+
+  static bool isCPLoad(const LoadSDNode *N);
+  static bool isConstantLoad(const LoadSDNode *N, int cbID);
+  static bool isGlobalLoad(const LoadSDNode *N);
+  static bool isPrivateLoad(const LoadSDNode *N);
+  static bool isLocalLoad(const LoadSDNode *N);
+  static bool isRegionLoad(const LoadSDNode *N);
+
+  virtual const char *getPassName() const;
+private:
+  SDNode *xformAtomicInst(SDNode *N);
+
+  // Include the pieces autogenerated from the target description.
+#include "AMDILGenDAGISel.inc"
+};
+}  // end anonymous namespace
+
+// createAMDILISelDag - This pass converts a legalized DAG into a AMDIL-specific
+// DAG, ready for instruction scheduling.
+//
+FunctionPass *llvm::createAMDILISelDag(AMDILTargetMachine &TM
+                                        AMDIL_OPT_LEVEL_DECL) {
+  return new AMDILDAGToDAGISel(TM AMDIL_OPT_LEVEL_VAR);
+}
+
+AMDILDAGToDAGISel::AMDILDAGToDAGISel(AMDILTargetMachine &TM
+                                      AMDIL_OPT_LEVEL_DECL)
+  : SelectionDAGISel(TM AMDIL_OPT_LEVEL_VAR), Subtarget(TM.getSubtarget<AMDILSubtarget>())
+{
+}
+
+AMDILDAGToDAGISel::~AMDILDAGToDAGISel() {
+}
+
+SDValue AMDILDAGToDAGISel::getSmallIPtrImm(unsigned int Imm) {
+  return CurDAG->getTargetConstant(Imm, MVT::i32);
+}
+
+bool AMDILDAGToDAGISel::SelectADDRParam(
+    SDValue Addr, SDValue& R1, SDValue& R2) {
+
+  if (Addr.getOpcode() == ISD::FrameIndex) {
+    if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
+      R1 = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i32);
+      R2 = CurDAG->getTargetConstant(0, MVT::i32);
+    } else {
+      R1 = Addr;
+      R2 = CurDAG->getTargetConstant(0, MVT::i32);
+    }
+  } else if (Addr.getOpcode() == ISD::ADD) {
+    R1 = Addr.getOperand(0);
+    R2 = Addr.getOperand(1);
+  } else {
+    R1 = Addr;
+    R2 = CurDAG->getTargetConstant(0, MVT::i32);
+  }
+  return true;
+}
+
+bool AMDILDAGToDAGISel::SelectADDR(SDValue Addr, SDValue& R1, SDValue& R2) {
+  if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
+      Addr.getOpcode() == ISD::TargetGlobalAddress) {
+    return false;
+  }
+  return SelectADDRParam(Addr, R1, R2);
+}
+
+
+bool AMDILDAGToDAGISel::SelectADDR64(SDValue Addr, SDValue& R1, SDValue& R2) {
+  if (Addr.getOpcode() == ISD::TargetExternalSymbol ||
+      Addr.getOpcode() == ISD::TargetGlobalAddress) {
+    return false;
+  }
+
+  if (Addr.getOpcode() == ISD::FrameIndex) {
+    if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(Addr)) {
+      R1 = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i64);
+      R2 = CurDAG->getTargetConstant(0, MVT::i64);
+    } else {
+      R1 = Addr;
+      R2 = CurDAG->getTargetConstant(0, MVT::i64);
+    }
+  } else if (Addr.getOpcode() == ISD::ADD) {
+    R1 = Addr.getOperand(0);
+    R2 = Addr.getOperand(1);
+  } else {
+    R1 = Addr;
+    R2 = CurDAG->getTargetConstant(0, MVT::i64);
+  }
+  return true;
+}
+
+SDNode *AMDILDAGToDAGISel::Select(SDNode *N) {
+  unsigned int Opc = N->getOpcode();
+  if (N->isMachineOpcode()) {
+    return NULL;   // Already selected.
+  }
+  switch (Opc) {
+  default: break;
+  case ISD::FrameIndex:
+    {
+      if (FrameIndexSDNode *FIN = dyn_cast<FrameIndexSDNode>(N)) {
+        unsigned int FI = FIN->getIndex();
+        EVT OpVT = N->getValueType(0);
+        unsigned int NewOpc = AMDIL::MOVE_i32;
+        SDValue TFI = CurDAG->getTargetFrameIndex(FI, MVT::i32);
+        return CurDAG->SelectNodeTo(N, NewOpc, OpVT, TFI);
+      }
+    }
+    break;
+  }
+  // For all atomic instructions, we need to add a constant
+  // operand that stores the resource ID in the instruction
+  if (Opc > AMDILISD::ADDADDR && Opc < AMDILISD::APPEND_ALLOC) {
+    N = xformAtomicInst(N);
+  }
+  return SelectCode(N);
+}
+
+bool AMDILDAGToDAGISel::isGlobalStore(const StoreSDNode *N) {
+  return check_type(N->getSrcValue(), AMDILAS::GLOBAL_ADDRESS);
+}
+
+bool AMDILDAGToDAGISel::isPrivateStore(const StoreSDNode *N) {
+  return (!check_type(N->getSrcValue(), AMDILAS::LOCAL_ADDRESS)
+          && !check_type(N->getSrcValue(), AMDILAS::GLOBAL_ADDRESS)
+          && !check_type(N->getSrcValue(), AMDILAS::REGION_ADDRESS));
+}
+
+bool AMDILDAGToDAGISel::isLocalStore(const StoreSDNode *N) {
+  return check_type(N->getSrcValue(), AMDILAS::LOCAL_ADDRESS);
+}
+
+bool AMDILDAGToDAGISel::isRegionStore(const StoreSDNode *N) {
+  return check_type(N->getSrcValue(), AMDILAS::REGION_ADDRESS);
+}
+
+bool AMDILDAGToDAGISel::isConstantLoad(const LoadSDNode *N, int cbID) {
+  if (check_type(N->getSrcValue(), AMDILAS::CONSTANT_ADDRESS)) {
+    return true;
+  }
+  MachineMemOperand *MMO = N->getMemOperand();
+  const Value *V = MMO->getValue();
+  const Value *BV = getBasePointerValue(V);
+  if (MMO
+      && MMO->getValue()
+      && ((V && dyn_cast<GlobalValue>(V))
+          || (BV && dyn_cast<GlobalValue>(
+                        getBasePointerValue(MMO->getValue()))))) {
+    return check_type(N->getSrcValue(), AMDILAS::PRIVATE_ADDRESS);
+  } else {
+    return false;
+  }
+}
+
+bool AMDILDAGToDAGISel::isGlobalLoad(const LoadSDNode *N) {
+  return check_type(N->getSrcValue(), AMDILAS::GLOBAL_ADDRESS);
+}
+
+bool AMDILDAGToDAGISel::isLocalLoad(const  LoadSDNode *N) {
+  return check_type(N->getSrcValue(), AMDILAS::LOCAL_ADDRESS);
+}
+
+bool AMDILDAGToDAGISel::isRegionLoad(const  LoadSDNode *N) {
+  return check_type(N->getSrcValue(), AMDILAS::REGION_ADDRESS);
+}
+
+bool AMDILDAGToDAGISel::isCPLoad(const LoadSDNode *N) {
+  MachineMemOperand *MMO = N->getMemOperand();
+  if (check_type(N->getSrcValue(), AMDILAS::PRIVATE_ADDRESS)) {
+    if (MMO) {
+      const Value *V = MMO->getValue();
+      const PseudoSourceValue *PSV = dyn_cast<PseudoSourceValue>(V);
+      if (PSV && PSV == PseudoSourceValue::getConstantPool()) {
+        return true;
+      }
+    }
+  }
+  return false;
+}
+
+bool AMDILDAGToDAGISel::isPrivateLoad(const LoadSDNode *N) {
+  if (check_type(N->getSrcValue(), AMDILAS::PRIVATE_ADDRESS)) {
+    // Check to make sure we are not a constant pool load or a constant load
+    // that is marked as a private load
+    if (isCPLoad(N) || isConstantLoad(N, -1)) {
+      return false;
+    }
+  }
+  if (!check_type(N->getSrcValue(), AMDILAS::LOCAL_ADDRESS)
+      && !check_type(N->getSrcValue(), AMDILAS::GLOBAL_ADDRESS)
+      && !check_type(N->getSrcValue(), AMDILAS::REGION_ADDRESS)
+      && !check_type(N->getSrcValue(), AMDILAS::CONSTANT_ADDRESS)
+      && !check_type(N->getSrcValue(), AMDILAS::PARAM_D_ADDRESS)
+      && !check_type(N->getSrcValue(), AMDILAS::PARAM_I_ADDRESS))
+  {
+    return true;
+  }
+  return false;
+}
+
+const char *AMDILDAGToDAGISel::getPassName() const {
+  return "AMDIL DAG->DAG Pattern Instruction Selection";
+}
+
+SDNode*
+AMDILDAGToDAGISel::xformAtomicInst(SDNode *N)
+{
+  uint32_t addVal = 1;
+  bool addOne = false;
+  // bool bitCastToInt = (N->getValueType(0) == MVT::f32);
+  unsigned opc = N->getOpcode();
+  switch (opc) {
+    default: return N;
+    case AMDILISD::ATOM_G_ADD:
+    case AMDILISD::ATOM_G_AND:
+    case AMDILISD::ATOM_G_MAX:
+    case AMDILISD::ATOM_G_UMAX:
+    case AMDILISD::ATOM_G_MIN:
+    case AMDILISD::ATOM_G_UMIN:
+    case AMDILISD::ATOM_G_OR:
+    case AMDILISD::ATOM_G_SUB:
+    case AMDILISD::ATOM_G_RSUB:
+    case AMDILISD::ATOM_G_XCHG:
+    case AMDILISD::ATOM_G_XOR:
+    case AMDILISD::ATOM_G_ADD_NORET:
+    case AMDILISD::ATOM_G_AND_NORET:
+    case AMDILISD::ATOM_G_MAX_NORET:
+    case AMDILISD::ATOM_G_UMAX_NORET:
+    case AMDILISD::ATOM_G_MIN_NORET:
+    case AMDILISD::ATOM_G_UMIN_NORET:
+    case AMDILISD::ATOM_G_OR_NORET:
+    case AMDILISD::ATOM_G_SUB_NORET:
+    case AMDILISD::ATOM_G_RSUB_NORET:
+    case AMDILISD::ATOM_G_XCHG_NORET:
+    case AMDILISD::ATOM_G_XOR_NORET:
+    case AMDILISD::ATOM_L_ADD:
+    case AMDILISD::ATOM_L_AND:
+    case AMDILISD::ATOM_L_MAX:
+    case AMDILISD::ATOM_L_UMAX:
+    case AMDILISD::ATOM_L_MIN:
+    case AMDILISD::ATOM_L_UMIN:
+    case AMDILISD::ATOM_L_OR:
+    case AMDILISD::ATOM_L_SUB:
+    case AMDILISD::ATOM_L_RSUB:
+    case AMDILISD::ATOM_L_XCHG:
+    case AMDILISD::ATOM_L_XOR:
+    case AMDILISD::ATOM_L_ADD_NORET:
+    case AMDILISD::ATOM_L_AND_NORET:
+    case AMDILISD::ATOM_L_MAX_NORET:
+    case AMDILISD::ATOM_L_UMAX_NORET:
+    case AMDILISD::ATOM_L_MIN_NORET:
+    case AMDILISD::ATOM_L_UMIN_NORET:
+    case AMDILISD::ATOM_L_OR_NORET:
+    case AMDILISD::ATOM_L_SUB_NORET:
+    case AMDILISD::ATOM_L_RSUB_NORET:
+    case AMDILISD::ATOM_L_XCHG_NORET:
+    case AMDILISD::ATOM_L_XOR_NORET:
+    case AMDILISD::ATOM_R_ADD:
+    case AMDILISD::ATOM_R_AND:
+    case AMDILISD::ATOM_R_MAX:
+    case AMDILISD::ATOM_R_UMAX:
+    case AMDILISD::ATOM_R_MIN:
+    case AMDILISD::ATOM_R_UMIN:
+    case AMDILISD::ATOM_R_OR:
+    case AMDILISD::ATOM_R_SUB:
+    case AMDILISD::ATOM_R_RSUB:
+    case AMDILISD::ATOM_R_XCHG:
+    case AMDILISD::ATOM_R_XOR:
+    case AMDILISD::ATOM_R_ADD_NORET:
+    case AMDILISD::ATOM_R_AND_NORET:
+    case AMDILISD::ATOM_R_MAX_NORET:
+    case AMDILISD::ATOM_R_UMAX_NORET:
+    case AMDILISD::ATOM_R_MIN_NORET:
+    case AMDILISD::ATOM_R_UMIN_NORET:
+    case AMDILISD::ATOM_R_OR_NORET:
+    case AMDILISD::ATOM_R_SUB_NORET:
+    case AMDILISD::ATOM_R_RSUB_NORET:
+    case AMDILISD::ATOM_R_XCHG_NORET:
+    case AMDILISD::ATOM_R_XOR_NORET:
+    case AMDILISD::ATOM_G_CMPXCHG:
+    case AMDILISD::ATOM_G_CMPXCHG_NORET:
+    case AMDILISD::ATOM_L_CMPXCHG:
+    case AMDILISD::ATOM_L_CMPXCHG_NORET:
+    case AMDILISD::ATOM_R_CMPXCHG:
+    case AMDILISD::ATOM_R_CMPXCHG_NORET:
+             break;
+    case AMDILISD::ATOM_G_DEC:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_G_SUB;
+             }
+             break;
+    case AMDILISD::ATOM_G_INC:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_G_ADD;
+             }
+             break;
+    case AMDILISD::ATOM_G_DEC_NORET:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_G_SUB_NORET;
+             }
+             break;
+    case AMDILISD::ATOM_G_INC_NORET:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_G_ADD_NORET;
+             }
+             break;
+    case AMDILISD::ATOM_L_DEC:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_L_SUB;
+             }
+             break;
+    case AMDILISD::ATOM_L_INC:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_L_ADD;
+             }
+             break;
+    case AMDILISD::ATOM_L_DEC_NORET:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_L_SUB_NORET;
+             }
+             break;
+    case AMDILISD::ATOM_L_INC_NORET:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_L_ADD_NORET;
+             }
+             break;
+    case AMDILISD::ATOM_R_DEC:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_R_SUB;
+             }
+             break;
+    case AMDILISD::ATOM_R_INC:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_R_ADD;
+             }
+             break;
+    case AMDILISD::ATOM_R_DEC_NORET:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_R_SUB;
+             }
+             break;
+    case AMDILISD::ATOM_R_INC_NORET:
+             addOne = true;
+             if (Subtarget.calVersion() >= CAL_VERSION_SC_136) {
+               addVal = (uint32_t)-1;
+             } else {
+               opc = AMDILISD::ATOM_R_ADD_NORET;
+             }
+             break;
+  }
+  // The largest we can have is a cmpxchg w/ a return value and an output chain.
+  // The cmpxchg function has 3 inputs and a single output along with an
+  // output change and a target constant, giving a total of 6.
+  SDValue Ops[12];
+  unsigned x = 0;
+  unsigned y = N->getNumOperands();
+  for (x = 0; x < y; ++x) {
+    Ops[x] = N->getOperand(x);
+  }
+  if (addOne) {
+    Ops[x++] = SDValue(SelectCode(CurDAG->getConstant(addVal, MVT::i32).getNode()), 0);
+  }
+  Ops[x++] = CurDAG->getTargetConstant(0, MVT::i32);
+  SDVTList Tys = N->getVTList();
+  MemSDNode *MemNode = dyn_cast<MemSDNode>(N);
+  assert(MemNode && "Atomic should be of MemSDNode type!");
+  N = CurDAG->getMemIntrinsicNode(opc, N->getDebugLoc(), Tys, Ops, x,
+      MemNode->getMemoryVT(), MemNode->getMemOperand()).getNode();
+  return N;
+}
+
+#ifdef DEBUGTMP
+#undef INT64_C
+#endif
+#undef DEBUGTMP
--- a/src/gallium/drivers/radeon/AMDILISelLowering.cpp
+++ b/src/gallium/drivers/radeon/AMDILISelLowering.cpp
--- a/src/gallium/drivers/radeon/AMDILISelLowering.h
+++ b/src/gallium/drivers/radeon/AMDILISelLowering.h
@ -0,0 +1,527 @@
+//===-- AMDILISelLowering.h - AMDIL DAG Lowering Interface ------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file defines the interfaces that AMDIL uses to lower LLVM code into a
+// selection DAG.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDIL_ISELLOWERING_H_
+#define AMDIL_ISELLOWERING_H_
+#include "AMDIL.h"
+#include "llvm/CodeGen/CallingConvLower.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/SelectionDAG.h"
+#include "llvm/Target/TargetLowering.h"
+
+namespace llvm
+{
+  namespace AMDILISD
+  {
+    enum
+    {
+      FIRST_NUMBER = ISD::BUILTIN_OP_END,
+      INTTOANY,    // Dummy instruction that takes an int and goes to
+      // any type converts the SDNode to an int
+      DP_TO_FP,    // Conversion from 64bit FP to 32bit FP
+      FP_TO_DP,    // Conversion from 32bit FP to 64bit FP
+      BITCONV,     // instruction that converts from any type to any type
+      CMOV,        // 32bit FP Conditional move instruction
+      CMOVLOG,     // 32bit FP Conditional move logical instruction
+      SELECT,      // 32bit FP Conditional move logical instruction
+      SETCC,       // 32bit FP Conditional move logical instruction
+      ISGN,        // 32bit Int Sign instruction
+      INEGATE,     // 32bit Int Negation instruction
+      MAD,         // 32bit Fused Multiply Add instruction
+      ADD,         // 32/64 bit pseudo instruction
+      AND,         // 128 bit and instruction
+      OR,          // 128 bit or instruction
+      NOT,         // 128 bit not instruction
+      XOR,         // 128 bit xor instruction
+      MOVE,        // generic mov instruction
+      PHIMOVE,     // generic phi-node mov instruction
+      VBUILD,      // scalar to vector mov instruction
+      VEXTRACT,    // extract vector components
+      VINSERT,     // insert vector components
+      VCONCAT,     // concat a single vector to another vector
+      UMAD,        // 32bit UInt Fused Multiply Add instruction
+      CALL,        // Function call based on a single integer
+      RET,         // Return from a function call
+      SELECT_CC,   // Select the correct conditional instruction
+      BRCC,        // Select the correct branch instruction
+      CMPCC,       // Compare to GPR operands
+      CMPICC,      // Compare two GPR operands, set icc.
+      CMPFCC,      // Compare two FP operands, set fcc.
+      BRICC,       // Branch to dest on icc condition
+      BRFCC,       // Branch to dest on fcc condition
+      SELECT_ICC,  // Select between two values using the current ICC
+      //flags.
+      SELECT_FCC,  // Select between two values using the current FCC
+      //flags.
+      LCREATE,     // Create a 64bit integer from two 32 bit integers
+      LCOMPHI,     // Get the hi 32 bits from a 64 bit integer
+      LCOMPLO,     // Get the lo 32 bits from a 64 bit integer
+      DCREATE,     // Create a 64bit float from two 32 bit integers
+      DCOMPHI,     // Get the hi 32 bits from a 64 bit float
+      DCOMPLO,     // Get the lo 32 bits from a 64 bit float
+      LCREATE2,     // Create a 64bit integer from two 32 bit integers
+      LCOMPHI2,     // Get the hi 32 bits from a 64 bit integer
+      LCOMPLO2,     // Get the lo 32 bits from a 64 bit integer
+      DCREATE2,     // Create a 64bit float from two 32 bit integers
+      DCOMPHI2,     // Get the hi 32 bits from a 64 bit float
+      DCOMPLO2,     // Get the lo 32 bits from a 64 bit float
+      UMUL,        // 32bit unsigned multiplication
+      IFFB_HI,  // 32bit find first hi bit instruction
+      IFFB_LO,  // 32bit find first low bit instruction
+      DIV_INF,      // Divide with infinity returned on zero divisor
+      SMAX,        // Signed integer max
+      CMP,
+      IL_CC_I_GT,
+      IL_CC_I_LT,
+      IL_CC_I_GE,
+      IL_CC_I_LE,
+      IL_CC_I_EQ,
+      IL_CC_I_NE,
+      RET_FLAG,
+      BRANCH_COND,
+      LOOP_NZERO,
+      LOOP_ZERO,
+      LOOP_CMP,
+      ADDADDR,
+      // ATOMIC Operations
+      // Global Memory
+      ATOM_G_ADD = ISD::FIRST_TARGET_MEMORY_OPCODE,
+      ATOM_G_AND,
+      ATOM_G_CMPXCHG,
+      ATOM_G_DEC,
+      ATOM_G_INC,
+      ATOM_G_MAX,
+      ATOM_G_UMAX,
+      ATOM_G_MIN,
+      ATOM_G_UMIN,
+      ATOM_G_OR,
+      ATOM_G_SUB,
+      ATOM_G_RSUB,
+      ATOM_G_XCHG,
+      ATOM_G_XOR,
+      ATOM_G_ADD_NORET,
+      ATOM_G_AND_NORET,
+      ATOM_G_CMPXCHG_NORET,
+      ATOM_G_DEC_NORET,
+      ATOM_G_INC_NORET,
+      ATOM_G_MAX_NORET,
+      ATOM_G_UMAX_NORET,
+      ATOM_G_MIN_NORET,
+      ATOM_G_UMIN_NORET,
+      ATOM_G_OR_NORET,
+      ATOM_G_SUB_NORET,
+      ATOM_G_RSUB_NORET,
+      ATOM_G_XCHG_NORET,
+      ATOM_G_XOR_NORET,
+      // Local Memory
+      ATOM_L_ADD,
+      ATOM_L_AND,
+      ATOM_L_CMPXCHG,
+      ATOM_L_DEC,
+      ATOM_L_INC,
+      ATOM_L_MAX,
+      ATOM_L_UMAX,
+      ATOM_L_MIN,
+      ATOM_L_UMIN,
+      ATOM_L_OR,
+      ATOM_L_MSKOR,
+      ATOM_L_SUB,
+      ATOM_L_RSUB,
+      ATOM_L_XCHG,
+      ATOM_L_XOR,
+      ATOM_L_ADD_NORET,
+      ATOM_L_AND_NORET,
+      ATOM_L_CMPXCHG_NORET,
+      ATOM_L_DEC_NORET,
+      ATOM_L_INC_NORET,
+      ATOM_L_MAX_NORET,
+      ATOM_L_UMAX_NORET,
+      ATOM_L_MIN_NORET,
+      ATOM_L_UMIN_NORET,
+      ATOM_L_OR_NORET,
+      ATOM_L_MSKOR_NORET,
+      ATOM_L_SUB_NORET,
+      ATOM_L_RSUB_NORET,
+      ATOM_L_XCHG_NORET,
+      ATOM_L_XOR_NORET,
+      // Region Memory
+      ATOM_R_ADD,
+      ATOM_R_AND,
+      ATOM_R_CMPXCHG,
+      ATOM_R_DEC,
+      ATOM_R_INC,
+      ATOM_R_MAX,
+      ATOM_R_UMAX,
+      ATOM_R_MIN,
+      ATOM_R_UMIN,
+      ATOM_R_OR,
+      ATOM_R_MSKOR,
+      ATOM_R_SUB,
+      ATOM_R_RSUB,
+      ATOM_R_XCHG,
+      ATOM_R_XOR,
+      ATOM_R_ADD_NORET,
+      ATOM_R_AND_NORET,
+      ATOM_R_CMPXCHG_NORET,
+      ATOM_R_DEC_NORET,
+      ATOM_R_INC_NORET,
+      ATOM_R_MAX_NORET,
+      ATOM_R_UMAX_NORET,
+      ATOM_R_MIN_NORET,
+      ATOM_R_UMIN_NORET,
+      ATOM_R_OR_NORET,
+      ATOM_R_MSKOR_NORET,
+      ATOM_R_SUB_NORET,
+      ATOM_R_RSUB_NORET,
+      ATOM_R_XCHG_NORET,
+      ATOM_R_XOR_NORET,
+      // Append buffer
+      APPEND_ALLOC,
+      APPEND_ALLOC_NORET,
+      APPEND_CONSUME,
+      APPEND_CONSUME_NORET,
+      // 2D Images
+      IMAGE2D_READ,
+      IMAGE2D_WRITE,
+      IMAGE2D_INFO0,
+      IMAGE2D_INFO1,
+      // 3D Images
+      IMAGE3D_READ,
+      IMAGE3D_WRITE,
+      IMAGE3D_INFO0,
+      IMAGE3D_INFO1,
+
+      LAST_ISD_NUMBER
+    };
+  } // AMDILISD
+
+  class MachineBasicBlock;
+  class MachineInstr;
+  class DebugLoc;
+  class TargetInstrInfo;
+
+  class AMDILTargetLowering : public TargetLowering
+  {
+    private:
+      int VarArgsFrameOffset;   // Frame offset to start of varargs area.
+    public:
+      AMDILTargetLowering(TargetMachine &TM);
+
+      virtual SDValue
+        LowerOperation(SDValue Op, SelectionDAG &DAG) const;
+
+      int
+        getVarArgsFrameOffset() const;
+
+      /// computeMaskedBitsForTargetNode - Determine which of
+      /// the bits specified
+      /// in Mask are known to be either zero or one and return them in
+      /// the
+      /// KnownZero/KnownOne bitsets.
+      virtual void
+        computeMaskedBitsForTargetNode(
+            const SDValue Op,
+            APInt &KnownZero,
+            APInt &KnownOne,
+            const SelectionDAG &DAG,
+            unsigned Depth = 0
+            ) const;
+
+      virtual MachineBasicBlock*
+        EmitInstrWithCustomInserter(
+            MachineInstr *MI,
+            MachineBasicBlock *MBB) const;
+
+      virtual bool 
+        getTgtMemIntrinsic(IntrinsicInfo &Info,
+                                  const CallInst &I, unsigned Intrinsic) const;
+      virtual const char*
+        getTargetNodeName(
+            unsigned Opcode
+            ) const;
+      // We want to mark f32/f64 floating point values as
+      // legal
+      bool
+        isFPImmLegal(const APFloat &Imm, EVT VT) const;
+      // We don't want to shrink f64/f32 constants because
+      // they both take up the same amount of space and
+      // we don't want to use a f2d instruction.
+      bool ShouldShrinkFPConstant(EVT VT) const;
+
+      /// getFunctionAlignment - Return the Log2 alignment of this
+      /// function.
+      virtual unsigned int
+        getFunctionAlignment(const Function *F) const;
+
+    private:
+      CCAssignFn*
+        CCAssignFnForNode(unsigned int CC) const;
+
+      SDValue LowerCallResult(SDValue Chain,
+          SDValue InFlag,
+          CallingConv::ID CallConv,
+          bool isVarArg,
+          const SmallVectorImpl<ISD::InputArg> &Ins,
+          DebugLoc dl,
+          SelectionDAG &DAG,
+          SmallVectorImpl<SDValue> &InVals) const;
+
+      SDValue LowerMemArgument(SDValue Chain,
+          CallingConv::ID CallConv,
+          const SmallVectorImpl<ISD::InputArg> &ArgInfo,
+          DebugLoc dl, SelectionDAG &DAG,
+          const CCValAssign &VA,  MachineFrameInfo *MFI,
+          unsigned i) const;
+
+      SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr,
+          SDValue Arg,
+          DebugLoc dl, SelectionDAG &DAG,
+          const CCValAssign &VA,
+          ISD::ArgFlagsTy Flags) const;
+
+      virtual SDValue
+        LowerFormalArguments(SDValue Chain,
+            CallingConv::ID CallConv, bool isVarArg,
+            const SmallVectorImpl<ISD::InputArg> &Ins,
+            DebugLoc dl, SelectionDAG &DAG,
+            SmallVectorImpl<SDValue> &InVals) const;
+
+      virtual SDValue
+        LowerCall(SDValue Chain, SDValue Callee,
+            CallingConv::ID CallConv, bool isVarArg, bool doesNotRet,
+            bool &isTailCall,
+            const SmallVectorImpl<ISD::OutputArg> &Outs,
+            const SmallVectorImpl<SDValue> &OutVals,
+            const SmallVectorImpl<ISD::InputArg> &Ins,
+            DebugLoc dl, SelectionDAG &DAG,
+            SmallVectorImpl<SDValue> &InVals) const;
+
+      virtual SDValue
+        LowerReturn(SDValue Chain,
+            CallingConv::ID CallConv, bool isVarArg,
+            const SmallVectorImpl<ISD::OutputArg> &Outs,
+            const SmallVectorImpl<SDValue> &OutVals,
+            DebugLoc dl, SelectionDAG &DAG) const;
+
+      //+++--- Function dealing with conversions between floating point and
+      //integer types ---+++//
+      SDValue
+        genCLZu64(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        genCLZuN(SDValue Op, SelectionDAG &DAG, uint32_t bits) const;
+      SDValue
+        genCLZu32(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        genf64toi32(SDValue Op, SelectionDAG &DAG,
+            bool includeSign) const;
+
+      SDValue
+        genf64toi64(SDValue Op, SelectionDAG &DAG,
+            bool includeSign) const;
+
+      SDValue
+        genu32tof64(SDValue Op, EVT dblvt, SelectionDAG &DAG) const;
+
+      SDValue
+        genu64tof64(SDValue Op, EVT dblvt, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerFP_TO_SINT(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerFP_TO_UINT(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG& DAG) const;
+
+      SDValue
+        LowerINTRINSIC_W_CHAIN(SDValue Op, SelectionDAG& DAG) const;
+
+      SDValue
+        LowerINTRINSIC_VOID(SDValue Op, SelectionDAG& DAG) const;
+
+      SDValue
+        LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerExternalSymbol(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerADD(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSUB(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSREM(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSREM8(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSREM16(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSREM32(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSREM64(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerUREM(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUREM8(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUREM16(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUREM32(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUREM64(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSDIV(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSDIV24(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSDIV32(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerSDIV64(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerUDIV(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUDIV24(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUDIV32(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerUDIV64(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerFDIV(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerFDIV32(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerFDIV64(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerMUL(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerAND(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerOR(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSELECT(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSETCC(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;
+
+      EVT
+        genIntType(uint32_t size = 32, uint32_t numEle = 1) const;
+
+      SDValue
+        LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
+
+      SDValue
+        LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
+      SDValue
+        LowerFP_ROUND(SDValue Op, SelectionDAG &DAG) const;
+      void
+        generateCMPInstr(MachineInstr*, MachineBasicBlock*,
+            const TargetInstrInfo&) const;
+      MachineOperand
+        convertToReg(MachineOperand) const;
+
+      // private members used by the set of instruction generation
+      // functions, these are marked mutable as they are cached so
+      // that they don't have to constantly be looked up when using the
+      // generateMachineInst/genVReg instructions. This is to simplify
+      // the code
+      // and to make it cleaner. The object itself doesn't change as
+      // only these functions use these three data types.
+      mutable MachineBasicBlock *mBB;
+      mutable DebugLoc *mDL;
+      mutable const TargetInstrInfo *mTII;
+      mutable MachineBasicBlock::iterator mBBI;
+      void
+        setPrivateData(MachineBasicBlock *BB, 
+            MachineBasicBlock::iterator &BBI, 
+            DebugLoc *DL,
+          const TargetInstrInfo *TII) const;
+      uint32_t genVReg(uint32_t regType) const;
+      MachineInstrBuilder
+        generateMachineInst(uint32_t opcode,
+          uint32_t dst) const;
+      MachineInstrBuilder
+        generateMachineInst(uint32_t opcode,
+          uint32_t dst, uint32_t src1) const;
+      MachineInstrBuilder
+        generateMachineInst(uint32_t opcode,
+          uint32_t dst, uint32_t src1, uint32_t src2) const;
+      MachineInstrBuilder
+        generateMachineInst(uint32_t opcode,
+          uint32_t dst, uint32_t src1, uint32_t src2,
+          uint32_t src3) const;
+      uint32_t
+        addExtensionInstructions(
+          uint32_t reg, bool signedShift,
+          unsigned int simpleVT) const;
+      void
+        generateLongRelational(MachineInstr *MI,
+          unsigned int opCode) const;
+
+  }; // AMDILTargetLowering
+} // end namespace llvm
+
+#endif    // AMDIL_ISELLOWERING_H_
--- a/src/gallium/drivers/radeon/AMDILImageExpansion.cpp
+++ b/src/gallium/drivers/radeon/AMDILImageExpansion.cpp
@ -0,0 +1,171 @@
+//===-- AMDILImageExpansion.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// @file AMDILImageExpansion.cpp
+// @details Implementatino of the Image expansion class for image capable devices
+//
+#include "AMDILIOExpansion.h"
+#include "AMDILKernelManager.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/CodeGen/MachineConstantPool.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/DerivedTypes.h"
+#include "llvm/Support/DebugLoc.h"
+#include "llvm/Target/TargetInstrInfo.h"
+#include "llvm/Value.h"
+
+using namespace llvm;
+
+AMDILImageExpansion::AMDILImageExpansion(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+  : AMDIL789IOExpansion(tm AMDIL_OPT_LEVEL_VAR)
+{
+}
+
+AMDILImageExpansion::~AMDILImageExpansion()
+{
+}
+void AMDILImageExpansion::expandInefficientImageLoad(
+    MachineBasicBlock *mBB, MachineInstr *MI)
+{
+#if 0
+  const llvm::StringRef &name = MI->getOperand(0).getGlobal()->getName();
+  const char *tReg1, *tReg2, *tReg3, *tReg4;
+  tReg1 = mASM->getRegisterName(MI->getOperand(1).getReg());
+  if (MI->getOperand(2).isReg()) {
+    tReg2 = mASM->getRegisterName(MI->getOperand(2).getReg());
+  } else {
+    tReg2 = mASM->getRegisterName(AMDIL::R1);
+    O << "\tmov " << tReg2 << ", l" << MI->getOperand(2).getImm() << "\n";
+  }
+  if (MI->getOperand(3).isReg()) {
+    tReg3 = mASM->getRegisterName(MI->getOperand(3).getReg());
+  } else {
+    tReg3 = mASM->getRegisterName(AMDIL::R2);
+    O << "\tmov " << tReg3 << ", l" << MI->getOperand(3).getImm() << "\n";
+  }
+  if (MI->getOperand(4).isReg()) {
+    tReg4 = mASM->getRegisterName(MI->getOperand(4).getReg());
+  } else {
+    tReg4 = mASM->getRegisterName(AMDIL::R3);
+    O << "\tmov " << tReg2 << ", l" << MI->getOperand(4).getImm() << "\n";
+  }
+  bool internalSampler = false;
+  //bool linear = true;
+  unsigned ImageCount = 3; // OPENCL_MAX_READ_IMAGES
+  unsigned SamplerCount = 3; // OPENCL_MAX_SAMPLERS
+  if (ImageCount - 1) {
+  O << "\tswitch " << mASM->getRegisterName(MI->getOperand(1).getReg())
+    << "\n";
+  }
+  for (unsigned rID = 0; rID < ImageCount; ++rID) {
+    if (ImageCount - 1)  {
+    if (!rID) {
+      O << "\tdefault\n";
+    } else {
+      O << "\tcase " << rID << "\n" ;
+    }
+    O << "\tswitch " << mASM->getRegisterName(MI->getOperand(2).getReg())
+     << "\n";
+    }
+    for (unsigned sID = 0; sID < SamplerCount; ++sID) {
+      if (SamplerCount - 1) {
+      if (!sID) {
+        O << "\tdefault\n";
+      } else {
+        O << "\tcase " << sID << "\n" ;
+      }
+      }
+      if (internalSampler) {
+        // Check if sampler has normalized setting.
+        O << "\tand r0.x, " << tReg2 << ".x, l0.y\n"
+          << "\tif_logicalz r0.x\n"
+          << "\tflr " << tReg3 << ", " << tReg3 << "\n"
+          << "\tsample_resource(" << rID << ")_sampler("
+          << sID << ")_coordtype(unnormalized) "
+          << tReg1 << ", " << tReg3 << " ; " << name.data() << "\n"
+          << "\telse\n"
+          << "\tiadd " << tReg1 << ".y, " << tReg1 << ".x, l0.y\n"
+          << "\titof " << tReg2 << ", cb1[" << tReg1 << ".x].xyz\n"
+          << "\tmul " << tReg3 << ", " << tReg3 << ", " << tReg2 << "\n"
+          << "\tflr " << tReg3 << ", " << tReg3 << "\n"
+          << "\tmul " << tReg3 << ", " << tReg3 << ", cb1[" 
+          << tReg1 << ".y].xyz\n"
+          << "\tsample_resource(" << rID << ")_sampler("
+          << sID << ")_coordtype(normalized) "
+          << tReg1 << ", " << tReg3 << " ; " << name.data() << "\n"
+          << "\tendif\n";
+      } else {
+        O << "\tiadd " << tReg1 << ".y, " << tReg1 << ".x, l0.y\n"
+          // Check if sampler has normalized setting.
+          << "\tand r0, " << tReg2 << ".x, l0.y\n"
+          // Convert image dimensions to float.
+          << "\titof " << tReg4 << ", cb1[" << tReg1 << ".x].xyz\n"
+          // Move into R0 1 if unnormalized or dimensions if normalized.
+          << "\tcmov_logical r0, r0, " << tReg4 << ", r1.1111\n"
+          // Make coordinates unnormalized.
+          << "\tmul " << tReg3 << ", r0, " << tReg3 << "\n"
+          // Get linear filtering if set.
+          << "\tand " << tReg4 << ", " << tReg2 << ".x, l6.x\n"
+          // Save unnormalized coordinates in R0.
+          << "\tmov r0, " << tReg3 << "\n"
+          // Floor the coordinates due to HW incompatibility with precision
+          // requirements.
+          << "\tflr " << tReg3 << ", " << tReg3 << "\n"
+          // get Origianl coordinates (without floor) if linear filtering
+          << "\tcmov_logical " << tReg3 << ", " << tReg4 
+          << ".xxxx, r0, " << tReg3 << "\n"
+          // Normalize the coordinates with multiplying by 1/dimensions
+          << "\tmul " << tReg3 << ", " << tReg3 << ", cb1[" 
+          << tReg1 << ".y].xyz\n"
+          << "\tsample_resource(" << rID << ")_sampler("
+          << sID << ")_coordtype(normalized) "
+          << tReg1 << ", " << tReg3 << " ; " << name.data() << "\n";
+      }
+      if (SamplerCount - 1) {
+      O << "\tbreak\n";
+      }
+    }
+    if (SamplerCount - 1) {
+      O << "\tendswitch\n";
+    }
+    if (ImageCount - 1) {
+    O << "\tbreak\n";
+    }
+  }
+  if (ImageCount - 1) {
+    O << "\tendswitch\n";
+  }
+#endif
+}
+  void
+AMDILImageExpansion::expandImageLoad(MachineBasicBlock *mBB, MachineInstr *MI)
+{
+  uint32_t imageID = getPointerID(MI);
+  MI->getOperand(1).ChangeToImmediate(imageID);
+  saveInst = true;
+}
+  void
+AMDILImageExpansion::expandImageStore(MachineBasicBlock *mBB, MachineInstr *MI)
+{
+  uint32_t imageID = getPointerID(MI);
+  mKM->setOutputInst();
+  MI->getOperand(0).ChangeToImmediate(imageID);
+  saveInst = true;
+}
+  void
+AMDILImageExpansion::expandImageParam(MachineBasicBlock *mBB, MachineInstr *MI)
+{
+    MachineBasicBlock::iterator I = *MI;
+    uint32_t ID = getPointerID(MI);
+    DebugLoc DL = MI->getDebugLoc();
+    BuildMI(*mBB, I, DL, mTII->get(AMDIL::CBLOAD), 
+        MI->getOperand(0).getReg())
+        .addImm(ID)
+        .addImm(1);
+}
--- a/src/gallium/drivers/radeon/AMDILInliner.cpp
+++ b/src/gallium/drivers/radeon/AMDILInliner.cpp
@ -0,0 +1,271 @@
+//===-- AMDILInliner.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+#define DEBUG_TYPE "amdilinline"
+#include "AMDIL.h"
+#include "AMDILCompilerErrors.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILSubtarget.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionAnalysis.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Function.h"
+#include "llvm/Instructions.h"
+#include "llvm/IntrinsicInst.h"
+#include "llvm/Support/CallSite.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Target/TargetData.h"
+#include "llvm/Target/TargetMachine.h"
+#include "llvm/Transforms/Utils/Cloning.h"
+#include "llvm/Transforms/Utils/Local.h"
+
+using namespace llvm;
+
+namespace
+{
+  class LLVM_LIBRARY_VISIBILITY AMDILInlinePass: public FunctionPass
+
+  {
+    public:
+      TargetMachine &TM;
+      static char ID;
+      AMDILInlinePass(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+      ~AMDILInlinePass();
+      virtual const char* getPassName() const;
+      virtual bool runOnFunction(Function &F);
+      bool doInitialization(Module &M);
+      bool doFinalization(Module &M);
+      virtual void getAnalysisUsage(AnalysisUsage &AU) const;
+    private:
+      typedef DenseMap<const ArrayType*, SmallVector<AllocaInst*,
+              DEFAULT_VEC_SLOTS> > InlinedArrayAllocasTy;
+      bool
+        AMDILInlineCallIfPossible(CallSite CS,
+            const TargetData *TD,
+            InlinedArrayAllocasTy &InlinedArrayAllocas);
+
+      CodeGenOpt::Level OptLevel;
+  };
+  char AMDILInlinePass::ID = 0;
+} // anonymouse namespace
+
+
+namespace llvm
+{
+  FunctionPass*
+    createAMDILInlinePass(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+    {
+      return new AMDILInlinePass(tm AMDIL_OPT_LEVEL_VAR);
+    }
+} // llvm namespace
+
+  AMDILInlinePass::AMDILInlinePass(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+: FunctionPass(ID), TM(tm)
+{
+  OptLevel = tm.getOptLevel();
+}
+AMDILInlinePass::~AMDILInlinePass()
+{
+}
+
+
+bool
+AMDILInlinePass::AMDILInlineCallIfPossible(CallSite CS,
+    const TargetData *TD, InlinedArrayAllocasTy &InlinedArrayAllocas) {
+  Function *Callee = CS.getCalledFunction();
+  Function *Caller = CS.getCaller();
+
+  // Try to inline the function.  Get the list of static allocas that were
+  // inlined.
+  SmallVector<AllocaInst*, 16> StaticAllocas;
+  InlineFunctionInfo IFI;
+  if (!InlineFunction(CS, IFI))
+    return false;
+  DEBUG(errs() << "<amdilinline> function " << Caller->getName()
+      << ": inlined call to "<< Callee->getName() << "\n");
+
+  // If the inlined function had a higher stack protection level than the
+  // calling function, then bump up the caller's stack protection level.
+  if (Callee->hasFnAttr(Attribute::StackProtectReq))
+    Caller->addFnAttr(Attribute::StackProtectReq);
+  else if (Callee->hasFnAttr(Attribute::StackProtect) &&
+      !Caller->hasFnAttr(Attribute::StackProtectReq))
+    Caller->addFnAttr(Attribute::StackProtect);
+
+
+  // Look at all of the allocas that we inlined through this call site.  If we
+  // have already inlined other allocas through other calls into this function,
+  // then we know that they have disjoint lifetimes and that we can merge them.
+  //
+  // There are many heuristics possible for merging these allocas, and the
+  // different options have different tradeoffs.  One thing that we *really*
+  // don't want to hurt is SRoA: once inlining happens, often allocas are no
+  // longer address taken and so they can be promoted.
+  //
+  // Our "solution" for that is to only merge allocas whose outermost type is an
+  // array type.  These are usually not promoted because someone is using a
+  // variable index into them.  These are also often the most important ones to
+  // merge.
+  //
+  // A better solution would be to have real memory lifetime markers in the IR
+  // and not have the inliner do any merging of allocas at all.  This would
+  // allow the backend to do proper stack slot coloring of all allocas that
+  // *actually make it to the backend*, which is really what we want.
+  //
+  // Because we don't have this information, we do this simple and useful hack.
+  //
+  SmallPtrSet<AllocaInst*, 16> UsedAllocas;
+
+  // Loop over all the allocas we have so far and see if they can be merged with
+  // a previously inlined alloca.  If not, remember that we had it.
+
+  for (unsigned AllocaNo = 0,
+      e = IFI.StaticAllocas.size();
+      AllocaNo != e; ++AllocaNo) {
+
+    AllocaInst *AI = IFI.StaticAllocas[AllocaNo];
+
+    // Don't bother trying to merge array allocations (they will usually be
+    // canonicalized to be an allocation *of* an array), or allocations whose
+    // type is not itself an array (because we're afraid of pessimizing SRoA).
+    const ArrayType *ATy = dyn_cast<ArrayType>(AI->getAllocatedType());
+    if (ATy == 0 || AI->isArrayAllocation())
+      continue;
+
+    // Get the list of all available allocas for this array type.
+    SmallVector<AllocaInst*, DEFAULT_VEC_SLOTS> &AllocasForType
+      = InlinedArrayAllocas[ATy];
+
+    // Loop over the allocas in AllocasForType to see if we can reuse one.  Note
+    // that we have to be careful not to reuse the same "available" alloca for
+    // multiple different allocas that we just inlined, we use the 'UsedAllocas'
+    // set to keep track of which "available" allocas are being used by this
+    // function.  Also, AllocasForType can be empty of course!
+    bool MergedAwayAlloca = false;
+    for (unsigned i = 0, e = AllocasForType.size(); i != e; ++i) {
+      AllocaInst *AvailableAlloca = AllocasForType[i];
+
+      // The available alloca has to be in the right function, not in some other
+      // function in this SCC.
+      if (AvailableAlloca->getParent() != AI->getParent())
+        continue;
+
+      // If the inlined function already uses this alloca then we can't reuse
+      // it.
+      if (!UsedAllocas.insert(AvailableAlloca))
+        continue;
+
+      // Otherwise, we *can* reuse it, RAUW AI into AvailableAlloca and declare
+      // success!
+      DEBUG(errs() << "    ***MERGED ALLOCA: " << *AI);
+
+      AI->replaceAllUsesWith(AvailableAlloca);
+      AI->eraseFromParent();
+      MergedAwayAlloca = true;
+      break;
+    }
+
+    // If we already nuked the alloca, we're done with it.
+    if (MergedAwayAlloca)
+      continue;
+
+    // If we were unable to merge away the alloca either because there are no
+    // allocas of the right type available or because we reused them all
+    // already, remember that this alloca came from an inlined function and mark
+    // it used so we don't reuse it for other allocas from this inline
+    // operation.
+    AllocasForType.push_back(AI);
+    UsedAllocas.insert(AI);
+  }
+
+  return true;
+}
+
+  bool
+AMDILInlinePass::runOnFunction(Function &MF)
+{
+  Function *F = &MF;
+  const AMDILSubtarget &STM = TM.getSubtarget<AMDILSubtarget>();
+  if (STM.device()->isSupported(AMDILDeviceInfo::NoInline)) {
+    return false;
+  }
+  const TargetData *TD = getAnalysisIfAvailable<TargetData>();
+  SmallVector<CallSite, 16> CallSites;
+  for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB) {
+    for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
+      CallSite CS = CallSite(cast<Value>(I));
+      // If this isn't a call, or it is a call to an intrinsic, it can
+      // never be inlined.
+      if (CS.getInstruction() == 0 || isa<IntrinsicInst>(I))
+        continue;
+
+      // If this is a direct call to an external function, we can never inline
+      // it.  If it is an indirect call, inlining may resolve it to be a
+      // direct call, so we keep it.
+      if (CS.getCalledFunction() && CS.getCalledFunction()->isDeclaration())
+        continue;
+
+      // We don't want to inline if we are recursive.
+      if (CS.getCalledFunction() && CS.getCalledFunction()->getName() == MF.getName()) {
+        AMDILMachineFunctionInfo *MFI =
+          getAnalysis<MachineFunctionAnalysis>().getMF()
+          .getInfo<AMDILMachineFunctionInfo>();
+        MFI->addErrorMsg(amd::CompilerErrorMessage[RECURSIVE_FUNCTION]);
+        continue;
+      }
+
+      CallSites.push_back(CS);
+    }
+  }
+
+  InlinedArrayAllocasTy InlinedArrayAllocas;
+  bool Changed = false;
+  for (unsigned CSi = 0; CSi != CallSites.size(); ++CSi) {
+    CallSite CS = CallSites[CSi];
+
+    Function *Callee = CS.getCalledFunction();
+
+    // We can only inline direct calls to non-declarations.
+    if (Callee == 0 || Callee->isDeclaration()) continue;
+
+    // Attempt to inline the function...
+    if (!AMDILInlineCallIfPossible(CS, TD, InlinedArrayAllocas))
+      continue;
+    Changed = true;
+  }
+  return Changed;
+}
+
+const char*
+AMDILInlinePass::getPassName() const
+{
+  return "AMDIL Inline Function Pass";
+}
+  bool
+AMDILInlinePass::doInitialization(Module &M)
+{
+  return false;
+}
+
+  bool
+AMDILInlinePass::doFinalization(Module &M)
+{
+  return false;
+}
+
+void
+AMDILInlinePass::getAnalysisUsage(AnalysisUsage &AU) const
+{
+  AU.addRequired<MachineFunctionAnalysis>();
+  FunctionPass::getAnalysisUsage(AU);
+  AU.setPreservesAll();
+}
--- a/src/gallium/drivers/radeon/AMDILInstrInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDILInstrInfo.cpp
@ -0,0 +1,709 @@
+//===- AMDILInstrInfo.cpp - AMDIL Instruction Information -------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file contains the AMDIL implementation of the TargetInstrInfo class.
+//
+//===----------------------------------------------------------------------===//
+#include "AMDILInstrInfo.h"
+#include "AMDILUtilityFunctions.h"
+
+#define GET_INSTRINFO_CTOR
+#include "AMDILGenInstrInfo.inc"
+
+#include "AMDILInstrInfo.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/PseudoSourceValue.h"
+#include "llvm/Instructions.h"
+
+using namespace llvm;
+
+AMDILInstrInfo::AMDILInstrInfo(AMDILTargetMachine &tm)
+  : AMDILGenInstrInfo(AMDIL::ADJCALLSTACKDOWN, AMDIL::ADJCALLSTACKUP),
+    RI(tm, *this),
+    TM(tm) {
+}
+
+const AMDILRegisterInfo &AMDILInstrInfo::getRegisterInfo() const {
+  return RI;
+}
+
+/// Return true if the instruction is a register to register move and leave the
+/// source and dest operands in the passed parameters.
+bool AMDILInstrInfo::isMoveInstr(const MachineInstr &MI, unsigned int &SrcReg,
+                                 unsigned int &DstReg, unsigned int &SrcSubIdx,
+                                 unsigned int &DstSubIdx) const {
+  // FIXME: we should look for:
+  //    add with 0
+  //assert(0 && "is Move Instruction has not been implemented yet!");
+  //return true;
+  if (!isMove(MI.getOpcode())) {
+    return false;
+  }
+  if (!MI.getOperand(0).isReg() || !MI.getOperand(1).isReg()) {
+    return false;
+  }
+  SrcReg = MI.getOperand(1).getReg();
+  DstReg = MI.getOperand(0).getReg();
+  DstSubIdx = 0;
+  SrcSubIdx = 0;
+  return true;
+}
+
+bool AMDILInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
+                                           unsigned &SrcReg, unsigned &DstReg,
+                                           unsigned &SubIdx) const {
+// TODO: Implement this function
+  return false;
+}
+
+unsigned AMDILInstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
+                                             int &FrameIndex) const {
+// TODO: Implement this function
+  return 0;
+}
+
+unsigned AMDILInstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
+                                                   int &FrameIndex) const {
+// TODO: Implement this function
+  return 0;
+}
+
+bool AMDILInstrInfo::hasLoadFromStackSlot(const MachineInstr *MI,
+                                          const MachineMemOperand *&MMO,
+                                          int &FrameIndex) const {
+// TODO: Implement this function
+  return false;
+}
+unsigned AMDILInstrInfo::isStoreFromStackSlot(const MachineInstr *MI,
+                                              int &FrameIndex) const {
+// TODO: Implement this function
+  return 0;
+}
+unsigned AMDILInstrInfo::isStoreFromStackSlotPostFE(const MachineInstr *MI,
+                                                    int &FrameIndex) const {
+// TODO: Implement this function
+  return 0;
+}
+bool AMDILInstrInfo::hasStoreFromStackSlot(const MachineInstr *MI,
+                                           const MachineMemOperand *&MMO,
+                                           int &FrameIndex) const {
+// TODO: Implement this function
+  return false;
+}
+#if 0
+void
+AMDILInstrInfo::reMaterialize(MachineBasicBlock &MBB,
+                              MachineBasicBlock::iterator MI,
+                              unsigned DestReg, unsigned SubIdx,
+                             const MachineInstr *Orig,
+                             const TargetRegisterInfo *TRI) const {
+// TODO: Implement this function
+}
+
+MachineInst AMDILInstrInfo::duplicate(MachineInstr *Orig,
+                                      MachineFunction &MF) const {
+// TODO: Implement this function
+  return NULL;
+}
+#endif
+MachineInstr *
+AMDILInstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
+                                      MachineBasicBlock::iterator &MBBI,
+                                      LiveVariables *LV) const {
+// TODO: Implement this function
+  return NULL;
+}
+#if 0
+MachineInst AMDILInstrInfo::commuteInstruction(MachineInstr *MI,
+                                               bool NewMI = false) const {
+// TODO: Implement this function
+  return NULL;
+}
+bool
+AMDILInstrInfo::findCommutedOpIndices(MachineInstr *MI, unsigned &SrcOpIdx1,
+                                     unsigned &SrcOpIdx2) const
+{
+// TODO: Implement this function
+}
+bool
+AMDILInstrInfo::produceSameValue(const MachineInstr *MI0,
+                                const MachineInstr *MI1) const
+{
+// TODO: Implement this function
+}
+#endif
+bool AMDILInstrInfo::getNextBranchInstr(MachineBasicBlock::iterator &iter,
+                                        MachineBasicBlock &MBB) const {
+  while (iter != MBB.end()) {
+    switch (iter->getOpcode()) {
+    default:
+      break;
+      ExpandCaseToAllScalarTypes(AMDIL::BRANCH_COND);
+    case AMDIL::BRANCH:
+      return true;
+    };
+    ++iter;
+  }
+  return false;
+}
+
+bool AMDILInstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
+                                   MachineBasicBlock *&TBB,
+                                   MachineBasicBlock *&FBB,
+                                   SmallVectorImpl<MachineOperand> &Cond,
+                                   bool AllowModify) const {
+  bool retVal = true;
+  return retVal;
+  MachineBasicBlock::iterator iter = MBB.begin();
+  if (!getNextBranchInstr(iter, MBB)) {
+    retVal = false;
+  } else {
+    MachineInstr *firstBranch = iter;
+    if (!getNextBranchInstr(++iter, MBB)) {
+      if (firstBranch->getOpcode() == AMDIL::BRANCH) {
+        TBB = firstBranch->getOperand(0).getMBB();
+        firstBranch->eraseFromParent();
+        retVal = false;
+      } else {
+        TBB = firstBranch->getOperand(0).getMBB();
+        FBB = *(++MBB.succ_begin());
+        if (FBB == TBB) {
+          FBB = *(MBB.succ_begin());
+        }
+        Cond.push_back(firstBranch->getOperand(1));
+        retVal = false;
+      }
+    } else {
+      MachineInstr *secondBranch = iter;
+      if (!getNextBranchInstr(++iter, MBB)) {
+        if (secondBranch->getOpcode() == AMDIL::BRANCH) {
+          TBB = firstBranch->getOperand(0).getMBB();
+          Cond.push_back(firstBranch->getOperand(1));
+          FBB = secondBranch->getOperand(0).getMBB();
+          secondBranch->eraseFromParent();
+          retVal = false;
+        } else {
+          assert(0 && "Should not have two consecutive conditional branches");
+        }
+      } else {
+        MBB.getParent()->viewCFG();
+        assert(0 && "Should not have three branch instructions in"
+               " a single basic block");
+        retVal = false;
+      }
+    }
+  }
+  return retVal;
+}
+
+unsigned int AMDILInstrInfo::getBranchInstr(const MachineOperand &op) const {
+  const MachineInstr *MI = op.getParent();
+  
+  switch (MI->getDesc().OpInfo->RegClass) {
+  default: // FIXME: fallthrough??
+  case AMDIL::GPRI8RegClassID:  return AMDIL::BRANCH_COND_i8;
+  case AMDIL::GPRI16RegClassID: return AMDIL::BRANCH_COND_i16;
+  case AMDIL::GPRI32RegClassID: return AMDIL::BRANCH_COND_i32;
+  case AMDIL::GPRI64RegClassID: return AMDIL::BRANCH_COND_i64;
+  case AMDIL::GPRF32RegClassID: return AMDIL::BRANCH_COND_f32;
+  case AMDIL::GPRF64RegClassID: return AMDIL::BRANCH_COND_f64;
+  };
+}
+
+unsigned int
+AMDILInstrInfo::InsertBranch(MachineBasicBlock &MBB,
+                             MachineBasicBlock *TBB,
+                             MachineBasicBlock *FBB,
+                             const SmallVectorImpl<MachineOperand> &Cond,
+                             DebugLoc DL) const
+{
+  assert(TBB && "InsertBranch must not be told to insert a fallthrough");
+  for (unsigned int x = 0; x < Cond.size(); ++x) {
+    Cond[x].getParent()->dump();
+  }
+  if (FBB == 0) {
+    if (Cond.empty()) {
+      BuildMI(&MBB, DL, get(AMDIL::BRANCH)).addMBB(TBB);
+    } else {
+      BuildMI(&MBB, DL, get(getBranchInstr(Cond[0])))
+        .addMBB(TBB).addReg(Cond[0].getReg());
+    }
+    return 1;
+  } else {
+    BuildMI(&MBB, DL, get(getBranchInstr(Cond[0])))
+      .addMBB(TBB).addReg(Cond[0].getReg());
+    BuildMI(&MBB, DL, get(AMDIL::BRANCH)).addMBB(FBB);
+  }
+  assert(0 && "Inserting two branches not supported");
+  return 0;
+}
+
+unsigned int AMDILInstrInfo::RemoveBranch(MachineBasicBlock &MBB) const {
+  MachineBasicBlock::iterator I = MBB.end();
+  if (I == MBB.begin()) {
+    return 0;
+  }
+  --I;
+  switch (I->getOpcode()) {
+  default:
+    return 0;
+    ExpandCaseToAllScalarTypes(AMDIL::BRANCH_COND);
+  case AMDIL::BRANCH:
+    I->eraseFromParent();
+    break;
+  }
+  I = MBB.end();
+  
+  if (I == MBB.begin()) {
+    return 1;
+  }
+  --I;
+  switch (I->getOpcode()) {
+    // FIXME: only one case??
+  default:
+    return 1;
+    ExpandCaseToAllScalarTypes(AMDIL::BRANCH_COND);
+    I->eraseFromParent();
+    break;
+  }
+  return 2;
+}
+
+MachineBasicBlock::iterator skipFlowControl(MachineBasicBlock *MBB) {
+  MachineBasicBlock::iterator tmp = MBB->end();
+  if (!MBB->size()) {
+    return MBB->end();
+  }
+  while (--tmp) {
+    if (tmp->getOpcode() == AMDIL::ENDLOOP
+        || tmp->getOpcode() == AMDIL::ENDIF
+        || tmp->getOpcode() == AMDIL::ELSE) {
+      if (tmp == MBB->begin()) {
+        return tmp;
+      } else {
+        continue;
+      }
+    }  else {
+      return ++tmp;
+    }
+  }
+  return MBB->end();
+}
+
+bool
+AMDILInstrInfo::copyRegToReg(MachineBasicBlock &MBB,
+                             MachineBasicBlock::iterator I,
+                             unsigned DestReg, unsigned SrcReg,
+                             const TargetRegisterClass *DestRC,
+                             const TargetRegisterClass *SrcRC,
+                             DebugLoc DL) const {
+  // If we are adding to the end of a basic block we can safely assume that the
+  // move is caused by a PHI node since all move instructions that are non-PHI
+  // have already been inserted into the basic blocks Therefor we call the skip
+  // flow control instruction to move the iterator before the flow control
+  // instructions and put the move instruction there.
+  bool phi = (DestReg < 1025) || (SrcReg < 1025);
+  int movInst = phi ? getMoveInstFromID(DestRC->getID())
+                    : getPHIMoveInstFromID(DestRC->getID());
+  
+  MachineBasicBlock::iterator iTemp = (I == MBB.end()) ? skipFlowControl(&MBB)
+                                                       : I;
+  if (DestRC != SrcRC) {
+    //int convInst;
+    size_t dSize = DestRC->getSize();
+    size_t sSize = SrcRC->getSize();
+    if (dSize > sSize) {
+      // Elements are going to get duplicated.
+      BuildMI(MBB, iTemp, DL, get(movInst), DestReg).addReg(SrcReg);
+    } else if (dSize == sSize) {
+      // Direct copy, conversions are not handled.
+      BuildMI(MBB, iTemp, DL, get(movInst), DestReg).addReg(SrcReg);
+    } else if (dSize < sSize) {
+      // Elements are going to get dropped.
+      BuildMI(MBB, iTemp, DL, get(movInst), DestReg).addReg(SrcReg);
+    }
+  } else {
+    BuildMI( MBB, iTemp, DL, get(movInst), DestReg).addReg(SrcReg);
+  }
+  return true;
+}
+void
+AMDILInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
+                            MachineBasicBlock::iterator MI, DebugLoc DL,
+                            unsigned DestReg, unsigned SrcReg,
+                            bool KillSrc) const
+{
+  BuildMI(MBB, MI, DL, get(AMDIL::MOVE_v4i32), DestReg)
+    .addReg(SrcReg, getKillRegState(KillSrc));
+  return;
+#if 0
+  DEBUG(dbgs() << "Cannot copy " << RI.getName(SrcReg)
+               << " to " << RI.getName(DestReg) << '\n');
+  abort();
+#endif
+}
+void
+AMDILInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
+                                    MachineBasicBlock::iterator MI,
+                                    unsigned SrcReg, bool isKill,
+                                    int FrameIndex,
+                                    const TargetRegisterClass *RC,
+                                    const TargetRegisterInfo *TRI) const {
+  unsigned int Opc = 0;
+  // MachineInstr *curMI = MI;
+  MachineFunction &MF = *(MBB.getParent());
+  MachineFrameInfo &MFI = *MF.getFrameInfo();
+  
+  DebugLoc DL;
+  switch (RC->getID()) {
+  default:
+    Opc = AMDIL::PRIVATESTORE_v4i32;
+    break;
+  case AMDIL::GPRF32RegClassID:
+    Opc = AMDIL::PRIVATESTORE_f32;
+    break;
+  case AMDIL::GPRF64RegClassID:
+    Opc = AMDIL::PRIVATESTORE_f64;
+    break;
+  case AMDIL::GPRI16RegClassID:
+    Opc = AMDIL::PRIVATESTORE_i16;
+    break;
+  case AMDIL::GPRI32RegClassID:
+    Opc = AMDIL::PRIVATESTORE_i32;
+    break;
+  case AMDIL::GPRI8RegClassID:
+    Opc = AMDIL::PRIVATESTORE_i8;
+    break;
+  case AMDIL::GPRI64RegClassID:
+    Opc = AMDIL::PRIVATESTORE_i64;
+    break;
+  case AMDIL::GPRV2F32RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v2f32;
+    break;
+  case AMDIL::GPRV2F64RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v2f64;
+    break;
+  case AMDIL::GPRV2I16RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v2i16;
+    break;
+  case AMDIL::GPRV2I32RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v2i32;
+    break;
+  case AMDIL::GPRV2I8RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v2i8;
+    break;
+  case AMDIL::GPRV2I64RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v2i64;
+    break;
+  case AMDIL::GPRV4F32RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v4f32;
+    break;
+  case AMDIL::GPRV4I16RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v4i16;
+    break;
+  case AMDIL::GPRV4I32RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v4i32;
+    break;
+  case AMDIL::GPRV4I8RegClassID:
+    Opc = AMDIL::PRIVATESTORE_v4i8;
+    break;
+  }
+  if (MI != MBB.end()) DL = MI->getDebugLoc();
+  MachineMemOperand *MMO =
+   new MachineMemOperand(
+        MachinePointerInfo::getFixedStack(FrameIndex),
+                          MachineMemOperand::MOLoad,
+                          MFI.getObjectSize(FrameIndex),
+                          MFI.getObjectAlignment(FrameIndex));
+  if (MI != MBB.end()) {
+    DL = MI->getDebugLoc();
+  }
+  MachineInstr *nMI = BuildMI(MBB, MI, DL, get(Opc))
+    .addReg(SrcReg, getKillRegState(isKill))
+    .addFrameIndex(FrameIndex)
+    .addMemOperand(MMO)
+    .addImm(0);
+  AMDILAS::InstrResEnc curRes;
+  curRes.bits.ResourceID 
+    = TM.getSubtargetImpl()->device()->getResourceID(AMDILDevice::SCRATCH_ID);
+  setAsmPrinterFlags(nMI, curRes);
+}
+
+void
+AMDILInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
+                                     MachineBasicBlock::iterator MI,
+                                     unsigned DestReg, int FrameIndex,
+                                     const TargetRegisterClass *RC,
+                                     const TargetRegisterInfo *TRI) const {
+  unsigned int Opc = 0;
+  MachineFunction &MF = *(MBB.getParent());
+  MachineFrameInfo &MFI = *MF.getFrameInfo();
+  DebugLoc DL;
+  switch (RC->getID()) {
+  default:
+    Opc = AMDIL::PRIVATELOAD_v4i32;
+    break;
+  case AMDIL::GPRF32RegClassID:
+    Opc = AMDIL::PRIVATELOAD_f32;
+    break;
+  case AMDIL::GPRF64RegClassID:
+    Opc = AMDIL::PRIVATELOAD_f64;
+    break;
+  case AMDIL::GPRI16RegClassID:
+    Opc = AMDIL::PRIVATELOAD_i16;
+    break;
+  case AMDIL::GPRI32RegClassID:
+    Opc = AMDIL::PRIVATELOAD_i32;
+    break;
+  case AMDIL::GPRI8RegClassID:
+    Opc = AMDIL::PRIVATELOAD_i8;
+    break;
+  case AMDIL::GPRI64RegClassID:
+    Opc = AMDIL::PRIVATELOAD_i64;
+    break;
+  case AMDIL::GPRV2F32RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v2f32;
+    break;
+  case AMDIL::GPRV2F64RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v2f64;
+    break;
+  case AMDIL::GPRV2I16RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v2i16;
+    break;
+  case AMDIL::GPRV2I32RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v2i32;
+    break;
+  case AMDIL::GPRV2I8RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v2i8;
+    break;
+  case AMDIL::GPRV2I64RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v2i64;
+    break;
+  case AMDIL::GPRV4F32RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v4f32;
+    break;
+  case AMDIL::GPRV4I16RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v4i16;
+    break;
+  case AMDIL::GPRV4I32RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v4i32;
+    break;
+  case AMDIL::GPRV4I8RegClassID:
+    Opc = AMDIL::PRIVATELOAD_v4i8;
+    break;
+  }
+
+  MachineMemOperand *MMO =
+    new MachineMemOperand(
+        MachinePointerInfo::getFixedStack(FrameIndex),
+                          MachineMemOperand::MOLoad,
+                          MFI.getObjectSize(FrameIndex),
+                          MFI.getObjectAlignment(FrameIndex));
+  if (MI != MBB.end()) {
+    DL = MI->getDebugLoc();
+  }
+  MachineInstr* nMI = BuildMI(MBB, MI, DL, get(Opc))
+    .addReg(DestReg, RegState::Define)
+    .addFrameIndex(FrameIndex)
+    .addMemOperand(MMO)
+    .addImm(0);
+  AMDILAS::InstrResEnc curRes;
+  curRes.bits.ResourceID 
+    = TM.getSubtargetImpl()->device()->getResourceID(AMDILDevice::SCRATCH_ID);
+  setAsmPrinterFlags(nMI, curRes);
+
+}
+MachineInstr *
+AMDILInstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
+                                      MachineInstr *MI,
+                                      const SmallVectorImpl<unsigned> &Ops,
+                                      int FrameIndex) const {
+// TODO: Implement this function
+  return 0;
+}
+MachineInstr*
+AMDILInstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
+                                      MachineInstr *MI,
+                                      const SmallVectorImpl<unsigned> &Ops,
+                                      MachineInstr *LoadMI) const {
+  // TODO: Implement this function
+  return 0;
+}
+bool
+AMDILInstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
+                                     const SmallVectorImpl<unsigned> &Ops) const
+{
+  // TODO: Implement this function
+  return false;
+}
+bool
+AMDILInstrInfo::unfoldMemoryOperand(MachineFunction &MF, MachineInstr *MI,
+                                 unsigned Reg, bool UnfoldLoad,
+                                 bool UnfoldStore,
+                                 SmallVectorImpl<MachineInstr*> &NewMIs) const {
+  // TODO: Implement this function
+  return false;
+}
+
+bool
+AMDILInstrInfo::unfoldMemoryOperand(SelectionDAG &DAG, SDNode *N,
+                                    SmallVectorImpl<SDNode*> &NewNodes) const {
+  // TODO: Implement this function
+  return false;
+}
+
+unsigned
+AMDILInstrInfo::getOpcodeAfterMemoryUnfold(unsigned Opc,
+                                           bool UnfoldLoad, bool UnfoldStore,
+                                           unsigned *LoadRegIndex) const {
+  // TODO: Implement this function
+  return 0;
+}
+
+bool
+AMDILInstrInfo::areLoadsFromSameBasePtr(SDNode *Load1, SDNode *Load2,
+                                        int64_t &Offset1,
+                                        int64_t &Offset2) const {
+  return false;
+  if (!Load1->isMachineOpcode() || !Load2->isMachineOpcode()) {
+    return false;
+  }
+  const MachineSDNode *mload1 = dyn_cast<MachineSDNode>(Load1);
+  const MachineSDNode *mload2 = dyn_cast<MachineSDNode>(Load2);
+  if (!mload1 || !mload2) {
+    return false;
+  }
+  if (mload1->memoperands_empty() ||
+      mload2->memoperands_empty()) {
+    return false;
+  }
+  MachineMemOperand *memOp1 = (*mload1->memoperands_begin());
+  MachineMemOperand *memOp2 = (*mload2->memoperands_begin());
+  const Value *mv1 = memOp1->getValue();
+  const Value *mv2 = memOp2->getValue();
+  if (!memOp1->isLoad() || !memOp2->isLoad()) {
+    return false;
+  }
+  if (getBasePointerValue(mv1) == getBasePointerValue(mv2)) {
+    if (isa<GetElementPtrInst>(mv1) && isa<GetElementPtrInst>(mv2)) {
+      const GetElementPtrInst *gep1 = dyn_cast<GetElementPtrInst>(mv1);
+      const GetElementPtrInst *gep2 = dyn_cast<GetElementPtrInst>(mv2);
+      if (!gep1 || !gep2) {
+        return false;
+      }
+      if (gep1->getNumOperands() != gep2->getNumOperands()) {
+        return false;
+      }
+      for (unsigned i = 0, e = gep1->getNumOperands() - 1; i < e; ++i) {
+        const Value *op1 = gep1->getOperand(i);
+        const Value *op2 = gep2->getOperand(i);
+        if (op1 != op2) {
+          // If any value except the last one is different, return false.
+          return false;
+        }
+      }
+      unsigned size = gep1->getNumOperands()-1;
+      if (!isa<ConstantInt>(gep1->getOperand(size))
+          || !isa<ConstantInt>(gep2->getOperand(size))) {
+        return false;
+      }
+      Offset1 = dyn_cast<ConstantInt>(gep1->getOperand(size))->getSExtValue();
+      Offset2 = dyn_cast<ConstantInt>(gep2->getOperand(size))->getSExtValue();
+      return true;
+    } else if (isa<Argument>(mv1) && isa<Argument>(mv2)) {
+      return false;
+    } else if (isa<GlobalValue>(mv1) && isa<GlobalValue>(mv2)) {
+      return false;
+    }
+  }
+  return false;
+}
+
+bool AMDILInstrInfo::shouldScheduleLoadsNear(SDNode *Load1, SDNode *Load2,
+                                             int64_t Offset1, int64_t Offset2,
+                                             unsigned NumLoads) const {
+  assert(Offset2 > Offset1
+         && "Second offset should be larger than first offset!");
+  // If we have less than 16 loads in a row, and the offsets are within 16,
+  // then schedule together.
+  // TODO: Make the loads schedule near if it fits in a cacheline
+  return (NumLoads < 16 && (Offset2 - Offset1) < 16);
+}
+
+bool
+AMDILInstrInfo::ReverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond)
+  const {
+  // TODO: Implement this function
+  return true;
+}
+void AMDILInstrInfo::insertNoop(MachineBasicBlock &MBB,
+                                MachineBasicBlock::iterator MI) const {
+  // TODO: Implement this function
+}
+
+bool AMDILInstrInfo::isPredicated(const MachineInstr *MI) const {
+  // TODO: Implement this function
+  return false;
+}
+#if 0
+bool AMDILInstrInfo::isUnpredicatedTerminator(const MachineInstr *MI) const {
+  // TODO: Implement this function
+}
+
+bool AMDILInstrInfo::PredicateInstruction(MachineInstr *MI,
+        const SmallVectorImpl<MachineOperand> &Pred) const {
+    // TODO: Implement this function
+}
+#endif
+bool
+AMDILInstrInfo::SubsumesPredicate(const SmallVectorImpl<MachineOperand> &Pred1,
+                                  const SmallVectorImpl<MachineOperand> &Pred2)
+  const {
+  // TODO: Implement this function
+  return false;
+}
+
+bool AMDILInstrInfo::DefinesPredicate(MachineInstr *MI,
+                                      std::vector<MachineOperand> &Pred) const {
+  // TODO: Implement this function
+  return false;
+}
+
+bool AMDILInstrInfo::isPredicable(MachineInstr *MI) const {
+  // TODO: Implement this function
+  return MI->getDesc().isPredicable();
+}
+
+bool
+AMDILInstrInfo::isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const {
+  // TODO: Implement this function
+  return true;
+}
+
+unsigned AMDILInstrInfo::GetInstSizeInBytes(const MachineInstr *MI) const {
+  // TODO: Implement this function
+  return 0;
+}
+
+#if 0
+unsigned
+AMDILInstrInfo::GetFunctionSizeInBytes(const MachineFunction &MF) const {
+  // TODO: Implement this function
+  return 0;
+}
+
+unsigned AMDILInstrInfo::getInlineAsmLength(const char *Str,
+                                            const MCAsmInfo &MAI) const {
+  // TODO: Implement this function
+  return 0;
+}
+#endif
--- a/src/gallium/drivers/radeon/AMDILInstrInfo.h
+++ b/src/gallium/drivers/radeon/AMDILInstrInfo.h
@ -0,0 +1,175 @@
+//===- AMDILInstrInfo.h - AMDIL Instruction Information ---------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file contains the AMDIL implementation of the TargetInstrInfo class.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef AMDILINSTRUCTIONINFO_H_
+#define AMDILINSTRUCTIONINFO_H_
+
+#include "AMDILRegisterInfo.h"
+#include "llvm/Target/TargetInstrInfo.h"
+
+#define GET_INSTRINFO_HEADER
+#include "AMDILGenInstrInfo.inc"
+
+namespace llvm {
+  // AMDIL - This namespace holds all of the target specific flags that
+  // instruction info tracks.
+  //
+  //class AMDILTargetMachine;
+class AMDILInstrInfo : public AMDILGenInstrInfo {
+private:
+  const AMDILRegisterInfo RI;
+  AMDILTargetMachine &TM;
+  bool getNextBranchInstr(MachineBasicBlock::iterator &iter,
+                          MachineBasicBlock &MBB) const;
+  unsigned int getBranchInstr(const MachineOperand &op) const;
+public:
+  explicit AMDILInstrInfo(AMDILTargetMachine &tm);
+
+  // getRegisterInfo - TargetInstrInfo is a superset of MRegister info.  As
+  // such, whenever a client has an instance of instruction info, it should
+  // always be able to get register info as well (through this method).
+  const AMDILRegisterInfo &getRegisterInfo() const;
+
+  // Return true if the instruction is a register to register move and leave the
+  // source and dest operands in the passed parameters.
+  bool isMoveInstr(const MachineInstr &MI, unsigned int &SrcReg,
+                   unsigned int &DstReg, unsigned int &SrcSubIdx,
+                   unsigned int &DstSubIdx) const;
+
+  bool isCoalescableExtInstr(const MachineInstr &MI, unsigned &SrcReg,
+                             unsigned &DstReg, unsigned &SubIdx) const;
+
+  unsigned isLoadFromStackSlot(const MachineInstr *MI, int &FrameIndex) const;
+  unsigned isLoadFromStackSlotPostFE(const MachineInstr *MI,
+                                     int &FrameIndex) const;
+  bool hasLoadFromStackSlot(const MachineInstr *MI,
+                            const MachineMemOperand *&MMO,
+                            int &FrameIndex) const;
+  unsigned isStoreFromStackSlot(const MachineInstr *MI, int &FrameIndex) const;
+  unsigned isStoreFromStackSlotPostFE(const MachineInstr *MI,
+                                      int &FrameIndex) const;
+  bool hasStoreFromStackSlot(const MachineInstr *MI,
+                             const MachineMemOperand *&MMO,
+                             int &FrameIndex) const;
+
+
+#if 0
+  void reMaterialize(MachineBasicBlock &MBB,
+                     MachineBasicBlock::iterator MI,
+                     unsigned DestReg, unsigned SubIdx,
+                     const MachineInstr *Orig,
+                     const TargetRegisterInfo *TRI) const;
+  MachineInstr *duplicate(MachineInstr *Orig,
+                          MachineFunction &MF) const;
+#endif
+  MachineInstr *
+  convertToThreeAddress(MachineFunction::iterator &MFI,
+                        MachineBasicBlock::iterator &MBBI,
+                        LiveVariables *LV) const;
+#if 0
+  MachineInstr *commuteInstruction(MachineInstr *MI,
+                                   bool NewMI = false) const;
+  bool findCommutedOpIndices(MachineInstr *MI, unsigned &SrcOpIdx1,
+                             unsigned &SrcOpIdx2) const;
+  bool produceSameValue(const MachineInstr *MI0,
+                        const MachineInstr *MI1) const;
+
+#endif
+
+  bool AnalyzeBranch(MachineBasicBlock &MBB, MachineBasicBlock *&TBB,
+                     MachineBasicBlock *&FBB,
+                     SmallVectorImpl<MachineOperand> &Cond,
+                     bool AllowModify) const;
+
+  unsigned RemoveBranch(MachineBasicBlock &MBB) const;
+
+  unsigned
+  InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
+               MachineBasicBlock *FBB,
+               const SmallVectorImpl<MachineOperand> &Cond,
+               DebugLoc DL) const;
+
+  bool copyRegToReg(MachineBasicBlock &MBB,
+                    MachineBasicBlock::iterator I,
+                    unsigned DestReg, unsigned SrcReg,
+                    const TargetRegisterClass *DestRC,
+                    const TargetRegisterClass *SrcRC,
+                    DebugLoc DL) const;
+  virtual void copyPhysReg(MachineBasicBlock &MBB,
+                           MachineBasicBlock::iterator MI, DebugLoc DL,
+                           unsigned DestReg, unsigned SrcReg,
+                           bool KillSrc) const;
+
+  void storeRegToStackSlot(MachineBasicBlock &MBB,
+                           MachineBasicBlock::iterator MI,
+                           unsigned SrcReg, bool isKill, int FrameIndex,
+                           const TargetRegisterClass *RC,
+                           const TargetRegisterInfo *TRI) const;
+  void loadRegFromStackSlot(MachineBasicBlock &MBB,
+                            MachineBasicBlock::iterator MI,
+                            unsigned DestReg, int FrameIndex,
+                            const TargetRegisterClass *RC,
+                            const TargetRegisterInfo *TRI) const;
+
+protected:
+  MachineInstr *foldMemoryOperandImpl(MachineFunction &MF,
+                                      MachineInstr *MI,
+                                      const SmallVectorImpl<unsigned> &Ops,
+                                      int FrameIndex) const;
+  MachineInstr *foldMemoryOperandImpl(MachineFunction &MF,
+                                      MachineInstr *MI,
+                                      const SmallVectorImpl<unsigned> &Ops,
+                                      MachineInstr *LoadMI) const;
+public:
+  bool canFoldMemoryOperand(const MachineInstr *MI,
+                            const SmallVectorImpl<unsigned> &Ops) const;
+  bool unfoldMemoryOperand(MachineFunction &MF, MachineInstr *MI,
+                           unsigned Reg, bool UnfoldLoad, bool UnfoldStore,
+                           SmallVectorImpl<MachineInstr *> &NewMIs) const;
+  bool unfoldMemoryOperand(SelectionDAG &DAG, SDNode *N,
+                           SmallVectorImpl<SDNode *> &NewNodes) const;
+  unsigned getOpcodeAfterMemoryUnfold(unsigned Opc,
+                                      bool UnfoldLoad, bool UnfoldStore,
+                                      unsigned *LoadRegIndex = 0) const;
+  bool areLoadsFromSameBasePtr(SDNode *Load1, SDNode *Load2,
+                               int64_t &Offset1, int64_t &Offset2) const;
+  bool shouldScheduleLoadsNear(SDNode *Load1, SDNode *Load2,
+                               int64_t Offset1, int64_t Offset2,
+                               unsigned NumLoads) const;
+
+  bool ReverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const;
+  void insertNoop(MachineBasicBlock &MBB,
+                  MachineBasicBlock::iterator MI) const;
+  bool isPredicated(const MachineInstr *MI) const;
+#if 0
+  bool isUnpredicatedTerminator(const MachineInstr *MI) const;
+  bool PredicateInstruction(MachineInstr *MI,
+                            const SmallVectorImpl<MachineOperand> &Pred) const;
+#endif
+  bool SubsumesPredicate(const SmallVectorImpl<MachineOperand> &Pred1,
+                         const SmallVectorImpl<MachineOperand> &Pred2) const;
+  bool DefinesPredicate(MachineInstr *MI,
+                        std::vector<MachineOperand> &Pred) const;
+  bool isPredicable(MachineInstr *MI) const;
+  bool isSafeToMoveRegClassDefs(const TargetRegisterClass *RC) const;
+  unsigned GetInstSizeInBytes(const MachineInstr *MI) const;
+#if 0
+  unsigned GetFunctionSizeInBytes(const MachineFunction &MF) const;
+  unsigned getInlineAsmLength(const char *Str,
+                              const MCAsmInfo &MAI) const;
+#endif
+  };
+
+}
+
+#endif // AMDILINSTRINFO_H_
--- a/src/gallium/drivers/radeon/AMDILInstrInfo.td
+++ b/src/gallium/drivers/radeon/AMDILInstrInfo.td
@ -0,0 +1,115 @@
+//===------------ AMDILInstrInfo.td - AMDIL Target ------*-tablegen-*------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file describes the AMDIL instructions in TableGen format.
+//
+//===----------------------------------------------------------------------===//
+// AMDIL Instruction Predicate Definitions
+// Predicate that is set to true if the hardware supports double precision
+// divide
+def HasHWDDiv                 : Predicate<"Subtarget.device()"
+                           "->getGeneration() > AMDILDeviceInfo::HD4XXX && "
+              "Subtarget.device()->usesHardware(AMDILDeviceInfo::DoubleOps)">;
+
+// Predicate that is set to true if the hardware supports double, but not double
+// precision divide in hardware
+def HasSWDDiv             : Predicate<"Subtarget.device()"
+                           "->getGeneration() == AMDILDeviceInfo::HD4XXX &&"
+              "Subtarget.device()->usesHardware(AMDILDeviceInfo::DoubleOps)">;
+
+// Predicate that is set to true if the hardware support 24bit signed
+// math ops. Otherwise a software expansion to 32bit math ops is used instead.
+def HasHWSign24Bit          : Predicate<"Subtarget.device()"
+                            "->getGeneration() > AMDILDeviceInfo::HD5XXX">;
+
+// Predicate that is set to true if 64bit operations are supported or not
+def HasHW64Bit              : Predicate<"Subtarget.device()"
+                            "->usesHardware(AMDILDeviceInfo::LongOps)">;
+def HasSW64Bit              : Predicate<"Subtarget.device()"
+                            "->usesSoftware(AMDILDeviceInfo::LongOps)">;
+
+// Predicate that is set to true if the timer register is supported
+def HasTmrRegister          : Predicate<"Subtarget.device()"
+                            "->isSupported(AMDILDeviceInfo::TmrReg)">;
+// Predicate that is true if we are at least evergreen series
+def HasDeviceIDInst         : Predicate<"Subtarget.device()"
+                            "->getGeneration() >= AMDILDeviceInfo::HD5XXX">;
+
+// Predicate that is true if we have region address space.
+def hasRegionAS             : Predicate<"Subtarget.device()"
+                            "->usesHardware(AMDILDeviceInfo::RegionMem)">;
+
+// Predicate that is false if we don't have region address space.
+def noRegionAS             : Predicate<"!Subtarget.device()"
+                            "->isSupported(AMDILDeviceInfo::RegionMem)">;
+
+
+// Predicate that is set to true if 64bit Mul is supported in the IL or not
+def HasHW64Mul              : Predicate<"Subtarget.calVersion()" 
+                                          ">= CAL_VERSION_SC_139"
+                                          "&& Subtarget.device()"
+                                          "->getGeneration() >="
+                                          "AMDILDeviceInfo::HD5XXX">;
+def HasSW64Mul              : Predicate<"Subtarget.calVersion()" 
+                                          "< CAL_VERSION_SC_139">;
+// Predicate that is set to true if 64bit Div/Mod is supported in the IL or not
+def HasHW64DivMod           : Predicate<"Subtarget.device()"
+                            "->usesHardware(AMDILDeviceInfo::HW64BitDivMod)">;
+def HasSW64DivMod           : Predicate<"Subtarget.device()"
+                            "->usesSoftware(AMDILDeviceInfo::HW64BitDivMod)">;
+
+// Predicate that is set to true if 64bit pointer are used.
+def Has64BitPtr             : Predicate<"Subtarget.is64bit()">;
+def Has32BitPtr             : Predicate<"!Subtarget.is64bit()">;
+//===--------------------------------------------------------------------===//
+// Custom Operands
+//===--------------------------------------------------------------------===//
+include "AMDILOperands.td"
+
+//===--------------------------------------------------------------------===//
+// Custom Selection DAG Type Profiles
+//===--------------------------------------------------------------------===//
+include "AMDILProfiles.td"
+
+//===--------------------------------------------------------------------===//
+// Custom Selection DAG Nodes
+//===--------------------------------------------------------------------===//
+include "AMDILNodes.td"
+
+//===--------------------------------------------------------------------===//
+// Custom Pattern DAG Nodes
+//===--------------------------------------------------------------------===//
+include "AMDILPatterns.td"
+
+//===----------------------------------------------------------------------===//
+// Instruction format classes
+//===----------------------------------------------------------------------===//
+include "AMDILFormats.td"
+
+//===--------------------------------------------------------------------===//
+// Multiclass Instruction formats
+//===--------------------------------------------------------------------===//
+include "AMDILMultiClass.td"
+
+//===--------------------------------------------------------------------===//
+// Intrinsics support
+//===--------------------------------------------------------------------===//
+include "AMDILIntrinsics.td"
+
+//===--------------------------------------------------------------------===//
+// Instructions support
+//===--------------------------------------------------------------------===//
+include "AMDILInstructions.td"
+
+//===--------------------------------------------------------------------===//
+// Instruction Pattern support - This Must be the last include in the file
+// as it requires items defined in other files
+//===--------------------------------------------------------------------===//
+include "AMDILInstrPatterns.td"
+
--- a/src/gallium/drivers/radeon/AMDILInstrPatterns.td
+++ b/src/gallium/drivers/radeon/AMDILInstrPatterns.td
@ -0,0 +1,66 @@
+//===- AMDILInstrPatterns.td - AMDIL Target ------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//===--------------------------------------------------------------------===//
+// This file holds all the custom patterns that are used by the amdil backend
+//
+//===--------------------------------------------------------------------===//
+//===--------------------------------------------------------------------===//
+// Custom patterns for conversion operations
+//===--------------------------------------------------------------------===////
+// Pattern to remap integer or to IL_or
+def : Pat<(i32 (or GPRI32:$src0, GPRI32:$src1)),
+          (i32 (BINARY_OR_i32 GPRI32:$src0, GPRI32:$src1))>;
+// float ==> long patterns
+// unsigned: f32 -> i64
+def FTOUL : Pat<(i64 (fp_to_uint GPRF32:$src)),
+    (LCREATE (FTOU GPRF32:$src), (LOADCONST_i32 0))>;
+// signed: f32 -> i64
+def FTOL : Pat<(i64 (fp_to_sint GPRF32:$src)),
+    (LCREATE (FTOI GPRF32:$src), (LOADCONST_i32 0))>;
+// unsigned: i64 -> f32
+def ULTOF : Pat<(f32 (uint_to_fp GPRI64:$src)),
+    (UTOF (LLO GPRI64:$src))>;
+// signed: i64 -> f32
+def LTOF : Pat<(f32 (sint_to_fp GPRI64:$src)),
+    (ITOF (LLO GPRI64:$src))>;
+
+// integer subtraction
+// a - b ==> a + (-b)
+def SUB_i8 : Pat<(sub GPRI8:$src0, GPRI8:$src1),
+    (ADD_i8 GPRI8:$src0, (NEGATE_i8 GPRI8:$src1))>;
+def SUB_v2i8 : Pat<(sub GPRV2I8:$src0, GPRV2I8:$src1),
+    (ADD_v2i8 GPRV2I8:$src0, (NEGATE_v2i8 GPRV2I8:$src1))>;
+def SUB_v4i8 : Pat<(sub GPRV4I8:$src0, GPRV4I8:$src1),
+    (ADD_v4i8 GPRV4I8:$src0, (NEGATE_v4i8 GPRV4I8:$src1))>;
+def SUB_i16 : Pat<(sub GPRI16:$src0, GPRI16:$src1),
+    (ADD_i16 GPRI16:$src0, (NEGATE_i16 GPRI16:$src1))>;
+def SUB_v2i16 : Pat<(sub GPRV2I16:$src0, GPRV2I16:$src1),
+    (ADD_v2i16 GPRV2I16:$src0, (NEGATE_v2i16 GPRV2I16:$src1))>;
+def SUB_v4i16 : Pat<(sub GPRV4I16:$src0, GPRV4I16:$src1),
+    (ADD_v4i16 GPRV4I16:$src0, (NEGATE_v4i16 GPRV4I16:$src1))>;
+def SUB_i32 : Pat<(sub GPRI32:$src0, GPRI32:$src1),
+    (ADD_i32 GPRI32:$src0, (NEGATE_i32 GPRI32:$src1))>;
+def SUB_v2i32 : Pat<(sub GPRV2I32:$src0, GPRV2I32:$src1),
+    (ADD_v2i32 GPRV2I32:$src0, (NEGATE_v2i32 GPRV2I32:$src1))>;
+def SUB_v4i32 : Pat<(sub GPRV4I32:$src0, GPRV4I32:$src1),
+    (ADD_v4i32 GPRV4I32:$src0, (NEGATE_v4i32 GPRV4I32:$src1))>;
+// LLVM isn't lowering this correctly, so writing a pattern that
+// matches it isntead.
+def : Pat<(build_vector (i32 imm:$src)),
+    (VCREATE_v4i32 (LOADCONST_i32 imm:$src))>;
+
+// Calls:
+def : Pat<(IL_call tglobaladdr:$dst),
+    (CALL tglobaladdr:$dst)>;
+def : Pat<(IL_call texternalsym:$dst),
+    (CALL texternalsym:$dst)>;
+def : Pat<(IL_call tconstpool:$dst),
+  (CALL tconstpool:$dst)>;
+
+include "AMDILConversions.td"
--- a/src/gallium/drivers/radeon/AMDILInstructions.td
+++ b/src/gallium/drivers/radeon/AMDILInstructions.td
--- a/src/gallium/drivers/radeon/AMDILIntrinsicInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDILIntrinsicInfo.cpp
@ -0,0 +1,190 @@
+//===- AMDILIntrinsicInfo.cpp - AMDIL Intrinsic Information ------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file contains the AMDIL Implementation of the IntrinsicInfo class.
+//
+//===-----------------------------------------------------------------------===//
+
+#include "AMDILIntrinsicInfo.h"
+#include "AMDIL.h"
+#include "AMDILTargetMachine.h"
+#include "llvm/DerivedTypes.h"
+#include "llvm/Intrinsics.h"
+#include "llvm/Module.h"
+
+using namespace llvm;
+
+#define GET_LLVM_INTRINSIC_FOR_GCC_BUILTIN
+#include "AMDILGenIntrinsics.inc"
+#undef GET_LLVM_INTRINSIC_FOR_GCC_BUILTIN
+
+AMDILIntrinsicInfo::AMDILIntrinsicInfo(AMDILTargetMachine *tm) 
+  : TargetIntrinsicInfo(), mTM(tm)
+{
+}
+
+std::string 
+AMDILIntrinsicInfo::getName(unsigned int IntrID, Type **Tys,
+    unsigned int numTys) const 
+{
+  static const char* const names[] = {
+#define GET_INTRINSIC_NAME_TABLE
+#include "AMDILGenIntrinsics.inc"
+#undef GET_INTRINSIC_NAME_TABLE
+  };
+
+  //assert(!isOverloaded(IntrID)
+  //&& "AMDIL Intrinsics are not overloaded");
+  if (IntrID < Intrinsic::num_intrinsics) {
+    return 0;
+  }
+  assert(IntrID < AMDGPUIntrinsic::num_AMDIL_intrinsics
+      && "Invalid intrinsic ID");
+
+  std::string Result(names[IntrID - Intrinsic::num_intrinsics]);
+  return Result;
+}
+
+  static bool
+checkTruncation(const char *Name, unsigned int& Len)
+{
+  const char *ptr = Name + (Len - 1);
+  while(ptr != Name && *ptr != '_') {
+    --ptr;
+  }
+  // We don't want to truncate on atomic instructions
+  // but we do want to enter the check Truncation
+  // section so that we can translate the atomic
+  // instructions if we need to.
+  if (!strncmp(Name, "__atom", 6)) {
+    return true;
+  }
+  if (strstr(ptr, "i32")
+      || strstr(ptr, "u32")
+      || strstr(ptr, "i64")
+      || strstr(ptr, "u64")
+      || strstr(ptr, "f32")
+      || strstr(ptr, "f64")
+      || strstr(ptr, "i16")
+      || strstr(ptr, "u16")
+      || strstr(ptr, "i8")
+      || strstr(ptr, "u8")) {
+    Len = (unsigned int)(ptr - Name);
+    return true;
+  }
+  return false;
+}
+
+// We don't want to support both the OpenCL 1.0 atomics
+// and the 1.1 atomics with different names, so we translate
+// the 1.0 atomics to the 1.1 naming here if needed.
+static char*
+atomTranslateIfNeeded(const char *Name, unsigned int Len) 
+{
+  char *buffer = NULL;
+  if (strncmp(Name, "__atom_", 7))  {
+    // If we are not starting with __atom_, then
+    // go ahead and continue on with the allocation.
+    buffer = new char[Len + 1];
+    memcpy(buffer, Name, Len);
+  } else {
+    buffer = new char[Len + 3];
+    memcpy(buffer, "__atomic_", 9);
+    memcpy(buffer + 9, Name + 7, Len - 7);
+    Len += 2;
+  }
+  buffer[Len] = '\0';
+  return buffer;
+}
+
+unsigned int
+AMDILIntrinsicInfo::lookupName(const char *Name, unsigned int Len) const 
+{
+#define GET_FUNCTION_RECOGNIZER
+#include "AMDILGenIntrinsics.inc"
+#undef GET_FUNCTION_RECOGNIZER
+  AMDGPUIntrinsic::ID IntrinsicID
+    = (AMDGPUIntrinsic::ID)Intrinsic::not_intrinsic;
+  if (checkTruncation(Name, Len)) {
+    char *buffer = atomTranslateIfNeeded(Name, Len);
+    IntrinsicID = getIntrinsicForGCCBuiltin("AMDIL", buffer);
+    delete [] buffer;
+  } else {
+    IntrinsicID = getIntrinsicForGCCBuiltin("AMDIL", Name);
+  }
+  if (!isValidIntrinsic(IntrinsicID)) {
+    return 0;
+  }
+  if (IntrinsicID != (AMDGPUIntrinsic::ID)Intrinsic::not_intrinsic) {
+    return IntrinsicID;
+  }
+  return 0;
+}
+
+bool 
+AMDILIntrinsicInfo::isOverloaded(unsigned id) const 
+{
+  // Overload Table
+#define GET_INTRINSIC_OVERLOAD_TABLE
+#include "AMDILGenIntrinsics.inc"
+#undef GET_INTRINSIC_OVERLOAD_TABLE
+}
+
+/// This defines the "getAttributes(ID id)" method.
+#define GET_INTRINSIC_ATTRIBUTES
+#include "AMDILGenIntrinsics.inc"
+#undef GET_INTRINSIC_ATTRIBUTES
+
+Function*
+AMDILIntrinsicInfo::getDeclaration(Module *M, unsigned IntrID,
+    Type **Tys,
+    unsigned numTys) const 
+{
+  assert(!isOverloaded(IntrID) && "AMDIL intrinsics are not overloaded");
+  AttrListPtr AList = getAttributes((AMDGPUIntrinsic::ID) IntrID);
+  LLVMContext& Context = M->getContext();
+  unsigned int id = IntrID;
+  Type *ResultTy = NULL;
+  std::vector<Type*> ArgTys;
+  bool IsVarArg = false;
+
+#define GET_INTRINSIC_GENERATOR
+#include "AMDILGenIntrinsics.inc"
+#undef GET_INTRINSIC_GENERATOR
+  // We need to add the resource ID argument for atomics.
+  if (id >= AMDGPUIntrinsic::AMDIL_atomic_add_gi32
+        && id <= AMDGPUIntrinsic::AMDIL_atomic_xor_ru32_noret) {
+    ArgTys.push_back(IntegerType::get(Context, 32));
+  }
+
+  return cast<Function>(M->getOrInsertFunction(getName(IntrID),
+        FunctionType::get(ResultTy, ArgTys, IsVarArg),
+        AList));
+}
+
+/// Because the code generator has to support different SC versions, 
+/// this function is added to check that the intrinsic being used
+/// is actually valid. In the case where it isn't valid, the 
+/// function call is not translated into an intrinsic and the
+/// fall back software emulated path should pick up the result.
+bool
+AMDILIntrinsicInfo::isValidIntrinsic(unsigned int IntrID) const
+{
+  const AMDILSubtarget *stm = mTM->getSubtargetImpl();
+  switch (IntrID) {
+    default:
+      return true;
+    case AMDGPUIntrinsic::AMDIL_convert_f32_i32_rpi:
+    case AMDGPUIntrinsic::AMDIL_convert_f32_i32_flr:
+    case AMDGPUIntrinsic::AMDIL_convert_f32_f16_near:
+    case AMDGPUIntrinsic::AMDIL_convert_f32_f16_neg_inf:
+    case AMDGPUIntrinsic::AMDIL_convert_f32_f16_plus_inf:
+        return stm->calVersion() >= CAL_VERSION_SC_139;
+  };
+}
--- a/src/gallium/drivers/radeon/AMDILIntrinsicInfo.h
+++ b/src/gallium/drivers/radeon/AMDILIntrinsicInfo.h
@ -0,0 +1,49 @@
+//===- AMDILIntrinsicInfo.h - AMDIL Intrinsic Information ------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+//   Interface for the AMDIL Implementation of the Intrinsic Info class.
+//
+//===-----------------------------------------------------------------------===//
+#ifndef _AMDIL_INTRINSICS_H_
+#define _AMDIL_INTRINSICS_H_
+
+#include "llvm/Intrinsics.h"
+#include "llvm/Target/TargetIntrinsicInfo.h"
+
+namespace llvm {
+  class AMDILTargetMachine;
+  namespace AMDGPUIntrinsic {
+    enum ID {
+      last_non_AMDIL_intrinsic = Intrinsic::num_intrinsics - 1,
+#define GET_INTRINSIC_ENUM_VALUES
+#include "AMDILGenIntrinsics.inc"
+#undef GET_INTRINSIC_ENUM_VALUES
+      , num_AMDIL_intrinsics
+    };
+
+  }
+
+
+  class AMDILIntrinsicInfo : public TargetIntrinsicInfo {
+    AMDILTargetMachine *mTM;
+    public:
+      AMDILIntrinsicInfo(AMDILTargetMachine *tm);
+      std::string getName(unsigned int IntrId, Type **Tys = 0,
+          unsigned int numTys = 0) const;
+      unsigned int lookupName(const char *Name, unsigned int Len) const;
+      bool isOverloaded(unsigned int IID) const;
+      Function *getDeclaration(Module *M, unsigned int ID,
+          Type **Tys = 0,
+          unsigned int numTys = 0) const;
+      bool isValidIntrinsic(unsigned int) const;
+  }; // AMDILIntrinsicInfo
+}
+
+#endif // _AMDIL_INTRINSICS_H_
+
--- a/src/gallium/drivers/radeon/AMDILIntrinsics.td
+++ b/src/gallium/drivers/radeon/AMDILIntrinsics.td
@ -0,0 +1,705 @@
+//===- AMDILIntrinsics.td - Defines AMDIL Intrinscs -*- tablegen -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file defines all of the amdil-specific intrinsics
+//
+//===---------------------------------------------------------------===//
+
+let TargetPrefix = "AMDIL", isTarget = 1 in {
+//------------- Synchronization Functions - OpenCL 6.11.9 --------------------//
+  def int_AMDIL_fence   : GCCBuiltin<"mem_fence">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_global   : GCCBuiltin<"mem_fence_global">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_local   : GCCBuiltin<"mem_fence_local">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_region   : GCCBuiltin<"mem_fence_region">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_read_only   : GCCBuiltin<"read_mem_fence">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_read_only_global   : GCCBuiltin<"read_mem_fence_global">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_read_only_local   : GCCBuiltin<"read_mem_fence_local">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_read_only_region : GCCBuiltin<"read_mem_fence_region">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_write_only   : GCCBuiltin<"write_mem_fence">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_write_only_global   : GCCBuiltin<"write_mem_fence_global">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_write_only_local   : GCCBuiltin<"write_mem_fence_local">,
+        UnaryIntNoRetInt;
+  def int_AMDIL_fence_write_only_region : GCCBuiltin<"write_mem_fence_region">,
+        UnaryIntNoRetInt;
+
+  def int_AMDIL_early_exit : GCCBuiltin<"__amdil_early_exit">,
+        UnaryIntNoRetInt;
+
+  def int_AMDIL_cmov_logical  : GCCBuiltin<"__amdil_cmov_logical">,
+          TernaryIntInt;
+  def int_AMDIL_fabs : GCCBuiltin<"__amdil_fabs">, UnaryIntFloat;
+  def int_AMDIL_abs : GCCBuiltin<"__amdil_abs">, UnaryIntInt;
+
+  def int_AMDIL_bit_extract_i32 : GCCBuiltin<"__amdil_ibit_extract">,
+          TernaryIntInt;
+  def int_AMDIL_bit_extract_u32 : GCCBuiltin<"__amdil_ubit_extract">,
+          TernaryIntInt;
+  def int_AMDIL_bit_reverse_u32 : GCCBuiltin<"__amdil_ubit_reverse">,
+          UnaryIntInt;
+  def int_AMDIL_bit_count_i32 : GCCBuiltin<"__amdil_count_bits">,
+          UnaryIntInt;
+  def int_AMDIL_bit_find_first_lo : GCCBuiltin<"__amdil_ffb_lo">,
+          UnaryIntInt;
+  def int_AMDIL_bit_find_first_hi : GCCBuiltin<"__amdil_ffb_hi">,
+          UnaryIntInt;
+  def int_AMDIL_bit_find_first_sgn : GCCBuiltin<"__amdil_ffb_signed">,
+          UnaryIntInt;
+  def int_AMDIL_media_bitalign : GCCBuiltin<"__amdil_bitalign">,
+                    TernaryIntInt;
+  def int_AMDIL_media_bytealign : GCCBuiltin<"__amdil_bytealign">,
+                    TernaryIntInt;
+  def int_AMDIL_bit_insert_u32 : GCCBuiltin<"__amdil_ubit_insert">,
+                    QuaternaryIntInt;
+  def int_AMDIL_bfi : GCCBuiltin<"__amdil_bfi">,
+      TernaryIntInt;
+  def int_AMDIL_bfm : GCCBuiltin<"__amdil_bfm">,
+      BinaryIntInt;
+  def int_AMDIL_mad_i32 : GCCBuiltin<"__amdil_imad">,
+          TernaryIntInt;
+  def int_AMDIL_mad_u32 : GCCBuiltin<"__amdil_umad">,
+          TernaryIntInt;
+  def int_AMDIL_mad     : GCCBuiltin<"__amdil_mad">,
+          TernaryIntFloat;
+  def int_AMDIL_mulhi_i32 : GCCBuiltin<"__amdil_imul_high">,
+          BinaryIntInt;
+  def int_AMDIL_mulhi_u32 : GCCBuiltin<"__amdil_umul_high">,
+          BinaryIntInt;
+  def int_AMDIL_mul24_i32 : GCCBuiltin<"__amdil_imul24">,
+          BinaryIntInt;
+  def int_AMDIL_mul24_u32 : GCCBuiltin<"__amdil_umul24">,
+          BinaryIntInt;
+  def int_AMDIL_mulhi24_i32 : GCCBuiltin<"__amdil_imul24_high">,
+          BinaryIntInt;
+  def int_AMDIL_mulhi24_u32 : GCCBuiltin<"__amdil_umul24_high">,
+          BinaryIntInt;
+  def int_AMDIL_mad24_i32 : GCCBuiltin<"__amdil_imad24">,
+          TernaryIntInt;
+  def int_AMDIL_mad24_u32 : GCCBuiltin<"__amdil_umad24">,
+          TernaryIntInt;
+  def int_AMDIL_carry_i32 : GCCBuiltin<"__amdil_carry">,
+          BinaryIntInt;
+  def int_AMDIL_borrow_i32 : GCCBuiltin<"__amdil_borrow">,
+          BinaryIntInt;
+  def int_AMDIL_min_i32 : GCCBuiltin<"__amdil_imin">,
+          BinaryIntInt;
+  def int_AMDIL_min_u32 : GCCBuiltin<"__amdil_umin">,
+          BinaryIntInt;
+  def int_AMDIL_min     : GCCBuiltin<"__amdil_min">,
+          BinaryIntFloat;
+  def int_AMDIL_max_i32 : GCCBuiltin<"__amdil_imax">,
+          BinaryIntInt;
+  def int_AMDIL_max_u32 : GCCBuiltin<"__amdil_umax">,
+          BinaryIntInt;
+  def int_AMDIL_max     : GCCBuiltin<"__amdil_max">,
+          BinaryIntFloat;
+  def int_AMDIL_media_lerp_u4 : GCCBuiltin<"__amdil_u4lerp">,
+          TernaryIntInt;
+  def int_AMDIL_media_sad : GCCBuiltin<"__amdil_sad">,
+          TernaryIntInt;
+  def int_AMDIL_media_sad_hi : GCCBuiltin<"__amdil_sadhi">,
+          TernaryIntInt;
+  def int_AMDIL_fraction : GCCBuiltin<"__amdil_fraction">,
+          UnaryIntFloat;
+  def int_AMDIL_clamp : GCCBuiltin<"__amdil_clamp">,
+          TernaryIntFloat;
+  def int_AMDIL_pireduce : GCCBuiltin<"__amdil_pireduce">,
+          UnaryIntFloat;
+  def int_AMDIL_round_nearest : GCCBuiltin<"__amdil_round_nearest">,
+          UnaryIntFloat;
+  def int_AMDIL_round_neginf : GCCBuiltin<"__amdil_round_neginf">,
+          UnaryIntFloat;
+  def int_AMDIL_round_posinf : GCCBuiltin<"__amdil_round_posinf">,
+          UnaryIntFloat;
+  def int_AMDIL_round_zero : GCCBuiltin<"__amdil_round_zero">,
+          UnaryIntFloat;
+  def int_AMDIL_acos : GCCBuiltin<"__amdil_acos">,
+          UnaryIntFloat;
+  def int_AMDIL_atan : GCCBuiltin<"__amdil_atan">,
+          UnaryIntFloat;
+  def int_AMDIL_asin : GCCBuiltin<"__amdil_asin">,
+          UnaryIntFloat;
+  def int_AMDIL_cos : GCCBuiltin<"__amdil_cos">,
+          UnaryIntFloat;
+  def int_AMDIL_cos_vec : GCCBuiltin<"__amdil_cos_vec">,
+          UnaryIntFloat;
+  def int_AMDIL_tan : GCCBuiltin<"__amdil_tan">,
+          UnaryIntFloat;
+  def int_AMDIL_sin : GCCBuiltin<"__amdil_sin">,
+          UnaryIntFloat;
+  def int_AMDIL_sin_vec : GCCBuiltin<"__amdil_sin_vec">,
+          UnaryIntFloat;
+  def int_AMDIL_pow : GCCBuiltin<"__amdil_pow">, BinaryIntFloat;
+  def int_AMDIL_div : GCCBuiltin<"__amdil_div">, BinaryIntFloat;
+  def int_AMDIL_udiv : GCCBuiltin<"__amdil_udiv">, BinaryIntInt;
+  def int_AMDIL_sqrt: GCCBuiltin<"__amdil_sqrt">,
+          UnaryIntFloat;
+  def int_AMDIL_sqrt_vec: GCCBuiltin<"__amdil_sqrt_vec">,
+          UnaryIntFloat;
+  def int_AMDIL_exp : GCCBuiltin<"__amdil_exp">,
+          UnaryIntFloat;
+  def int_AMDIL_exp_vec : GCCBuiltin<"__amdil_exp_vec">,
+          UnaryIntFloat;
+  def int_AMDIL_exn : GCCBuiltin<"__amdil_exn">,
+          UnaryIntFloat;
+  def int_AMDIL_log : GCCBuiltin<"__amdil_log">,
+          UnaryIntFloat;
+  def int_AMDIL_log_vec : GCCBuiltin<"__amdil_log_vec">,
+          UnaryIntFloat;
+  def int_AMDIL_ln : GCCBuiltin<"__amdil_ln">,
+          UnaryIntFloat;
+  def int_AMDIL_sign: GCCBuiltin<"__amdil_sign">,
+          UnaryIntFloat;
+  def int_AMDIL_fma: GCCBuiltin<"__amdil_fma">,
+          TernaryIntFloat;
+  def int_AMDIL_rsq : GCCBuiltin<"__amdil_rsq">,
+          UnaryIntFloat;
+  def int_AMDIL_rsq_vec : GCCBuiltin<"__amdil_rsq_vec">,
+          UnaryIntFloat;
+  def int_AMDIL_length : GCCBuiltin<"__amdil_length">,
+          UnaryIntFloat;
+  def int_AMDIL_lerp : GCCBuiltin<"__amdil_lerp">,
+          TernaryIntFloat;
+  def int_AMDIL_media_sad4 : GCCBuiltin<"__amdil_sad4">,
+      Intrinsic<[llvm_i32_ty], [llvm_v4i32_ty,
+           llvm_v4i32_ty, llvm_i32_ty], []>;
+
+  def int_AMDIL_frexp_f64 : GCCBuiltin<"__amdil_frexp">,
+        Intrinsic<[llvm_v2i64_ty], [llvm_double_ty], []>;
+ def int_AMDIL_ldexp : GCCBuiltin<"__amdil_ldexp">,
+    Intrinsic<[llvm_anyfloat_ty], [llvm_anyfloat_ty, llvm_anyint_ty], []>;
+  def int_AMDIL_drcp : GCCBuiltin<"__amdil_rcp">,
+      Intrinsic<[llvm_double_ty], [llvm_double_ty], []>;
+  def int_AMDIL_convert_f16_f32 : GCCBuiltin<"__amdil_half_to_float">,
+      ConvertIntITOF;
+  def int_AMDIL_convert_f32_f16 : GCCBuiltin<"__amdil_float_to_half">,
+      ConvertIntFTOI;
+  def int_AMDIL_convert_f32_i32_rpi : GCCBuiltin<"__amdil_float_to_int_rpi">,
+      ConvertIntFTOI;
+  def int_AMDIL_convert_f32_i32_flr : GCCBuiltin<"__amdil_float_to_int_flr">,
+      ConvertIntFTOI;
+  def int_AMDIL_convert_f32_f16_near : GCCBuiltin<"__amdil_float_to_half_near">,
+      ConvertIntFTOI;
+  def int_AMDIL_convert_f32_f16_neg_inf : GCCBuiltin<"__amdil_float_to_half_neg_inf">,
+      ConvertIntFTOI;
+  def int_AMDIL_convert_f32_f16_plus_inf : GCCBuiltin<"__amdil_float_to_half_plus_inf">,
+      ConvertIntFTOI;
+ def int_AMDIL_media_convert_f2v4u8 : GCCBuiltin<"__amdil_f_2_u4">,
+      Intrinsic<[llvm_i32_ty], [llvm_v4f32_ty], []>;
+  def int_AMDIL_media_unpack_byte_0 : GCCBuiltin<"__amdil_unpack_0">,
+      ConvertIntITOF;
+  def int_AMDIL_media_unpack_byte_1 : GCCBuiltin<"__amdil_unpack_1">,
+      ConvertIntITOF;
+  def int_AMDIL_media_unpack_byte_2 : GCCBuiltin<"__amdil_unpack_2">,
+      ConvertIntITOF;
+  def int_AMDIL_media_unpack_byte_3 : GCCBuiltin<"__amdil_unpack_3">,
+      ConvertIntITOF;
+  def int_AMDIL_dp2_add : GCCBuiltin<"__amdil_dp2_add">,
+        Intrinsic<[llvm_float_ty], [llvm_v2f32_ty,
+          llvm_v2f32_ty, llvm_float_ty], []>;
+  def int_AMDIL_dp2 : GCCBuiltin<"__amdil_dp2">,
+        Intrinsic<[llvm_float_ty], [llvm_v2f32_ty,
+          llvm_v2f32_ty], []>;
+  def int_AMDIL_dp3 : GCCBuiltin<"__amdil_dp3">,
+        Intrinsic<[llvm_float_ty], [llvm_v4f32_ty,
+          llvm_v4f32_ty], []>;
+  def int_AMDIL_dp4 : GCCBuiltin<"__amdil_dp4">,
+        Intrinsic<[llvm_float_ty], [llvm_v4f32_ty,
+          llvm_v4f32_ty], []>;
+//===---------------------- Image functions begin ------------------------===//
+  def int_AMDIL_image1d_write : GCCBuiltin<"__amdil_image1d_write">,
+      Intrinsic<[], [llvm_ptr_ty, llvm_v2i32_ty, llvm_v4i32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image1d_read_norm  : GCCBuiltin<"__amdil_image1d_read_norm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image1d_read_unnorm  : GCCBuiltin<"__amdil_image1d_read_unnorm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image1d_info0 : GCCBuiltin<"__amdil_image1d_info0">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+  def int_AMDIL_image1d_info1 : GCCBuiltin<"__amdil_image1d_info1">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+ def int_AMDIL_image1d_array_write : GCCBuiltin<"__amdil_image1d_array_write">,
+      Intrinsic<[], [llvm_ptr_ty, llvm_v2i32_ty, llvm_v4i32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image1d_array_read_norm  : GCCBuiltin<"__amdil_image1d_array_read_norm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image1d_array_read_unnorm  : GCCBuiltin<"__amdil_image1d_array_read_unnorm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image1d_array_info0 : GCCBuiltin<"__amdil_image1d_array_info0">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+  def int_AMDIL_image1d_array_info1 : GCCBuiltin<"__amdil_image1d_array_info1">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+ def int_AMDIL_image2d_write : GCCBuiltin<"__amdil_image2d_write">,
+      Intrinsic<[], [llvm_ptr_ty, llvm_v2i32_ty, llvm_v4i32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image2d_read_norm  : GCCBuiltin<"__amdil_image2d_read_norm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image2d_read_unnorm  : GCCBuiltin<"__amdil_image2d_read_unnorm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image2d_info0 : GCCBuiltin<"__amdil_image2d_info0">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+  def int_AMDIL_image2d_info1 : GCCBuiltin<"__amdil_image2d_info1">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+ def int_AMDIL_image2d_array_write : GCCBuiltin<"__amdil_image2d_array_write">,
+      Intrinsic<[], [llvm_ptr_ty, llvm_v2i32_ty, llvm_v4i32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image2d_array_read_norm  : GCCBuiltin<"__amdil_image2d_array_read_norm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image2d_array_read_unnorm  : GCCBuiltin<"__amdil_image2d_array_read_unnorm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image2d_array_info0 : GCCBuiltin<"__amdil_image2d_array_info0">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+  def int_AMDIL_image2d_array_info1 : GCCBuiltin<"__amdil_image2d_array_info1">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+  def int_AMDIL_image3d_write : GCCBuiltin<"__amdil_image3d_write">,
+         Intrinsic<[], [llvm_ptr_ty, llvm_v4i32_ty, llvm_v4i32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image3d_read_norm  : GCCBuiltin<"__amdil_image3d_read_norm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image3d_read_unnorm  : GCCBuiltin<"__amdil_image3d_read_unnorm">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty, llvm_i32_ty, llvm_v4f32_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_image3d_info0 : GCCBuiltin<"__amdil_image3d_info0">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+  def int_AMDIL_image3d_info1 : GCCBuiltin<"__amdil_image3d_info1">,
+      Intrinsic<[llvm_v4i32_ty], [llvm_ptr_ty], []>;
+
+//===---------------------- Image functions end --------------------------===//
+
+  def int_AMDIL_append_alloc_i32 : GCCBuiltin<"__amdil_append_alloc">,
+      Intrinsic<[llvm_i32_ty], [llvm_ptr_ty], [IntrReadWriteArgMem]>;
+  def int_AMDIL_append_consume_i32 : GCCBuiltin<"__amdil_append_consume">,
+      Intrinsic<[llvm_i32_ty], [llvm_ptr_ty], [IntrReadWriteArgMem]>;
+  def int_AMDIL_append_alloc_i32_noret : GCCBuiltin<"__amdil_append_alloc_noret">,
+      Intrinsic<[llvm_i32_ty], [llvm_ptr_ty], [IntrReadWriteArgMem]>;
+  def int_AMDIL_append_consume_i32_noret : GCCBuiltin<"__amdil_append_consume_noret">,
+      Intrinsic<[llvm_i32_ty], [llvm_ptr_ty], [IntrReadWriteArgMem]>;
+
+  def int_AMDIL_get_global_id : GCCBuiltin<"__amdil_get_global_id_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_local_id : GCCBuiltin<"__amdil_get_local_id_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_group_id : GCCBuiltin<"__amdil_get_group_id_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_num_groups : GCCBuiltin<"__amdil_get_num_groups_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_local_size : GCCBuiltin<"__amdil_get_local_size_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_global_size : GCCBuiltin<"__amdil_get_global_size_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_global_offset : GCCBuiltin<"__amdil_get_global_offset_int">,
+      Intrinsic<[llvm_v4i32_ty], [], []>;
+  def int_AMDIL_get_work_dim : GCCBuiltin<"get_work_dim">,
+      Intrinsic<[llvm_i32_ty], [], []>;
+  def int_AMDIL_get_printf_offset : GCCBuiltin<"__amdil_get_printf_offset">,
+      Intrinsic<[llvm_i32_ty], []>;
+  def int_AMDIL_get_printf_size : GCCBuiltin<"__amdil_get_printf_size">,
+      Intrinsic<[llvm_i32_ty], []>;
+
+/// Intrinsics for atomic instructions with no return value
+/// Signed 32 bit integer atomics for global address space
+def int_AMDIL_atomic_add_gi32_noret : GCCBuiltin<"__atomic_add_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_sub_gi32_noret : GCCBuiltin<"__atomic_sub_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_rsub_gi32_noret : GCCBuiltin<"__atomic_rsub_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_gi32_noret : GCCBuiltin<"__atomic_xchg_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_inc_gi32_noret : GCCBuiltin<"__atomic_inc_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_dec_gi32_noret : GCCBuiltin<"__atomic_dec_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_cmpxchg_gi32_noret : GCCBuiltin<"__atomic_cmpxchg_gi32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_min_gi32_noret : GCCBuiltin<"__atomic_min_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_max_gi32_noret : GCCBuiltin<"__atomic_max_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_and_gi32_noret : GCCBuiltin<"__atomic_and_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_or_gi32_noret : GCCBuiltin<"__atomic_or_gi32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xor_gi32_noret : GCCBuiltin<"__atomic_xor_gi32_noret">,
+    BinaryAtomicIntNoRet;
+
+
+
+/// Unsigned 32 bit integer atomics for global address space
+def int_AMDIL_atomic_add_gu32_noret : GCCBuiltin<"__atomic_add_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_sub_gu32_noret : GCCBuiltin<"__atomic_sub_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_rsub_gu32_noret : GCCBuiltin<"__atomic_rsub_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_gu32_noret : GCCBuiltin<"__atomic_xchg_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_inc_gu32_noret : GCCBuiltin<"__atomic_inc_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_dec_gu32_noret : GCCBuiltin<"__atomic_dec_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_cmpxchg_gu32_noret : GCCBuiltin<"__atomic_cmpxchg_gu32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_min_gu32_noret : GCCBuiltin<"__atomic_min_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_max_gu32_noret : GCCBuiltin<"__atomic_max_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_and_gu32_noret : GCCBuiltin<"__atomic_and_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_or_gu32_noret : GCCBuiltin<"__atomic_or_gu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xor_gu32_noret : GCCBuiltin<"__atomic_xor_gu32_noret">,
+    BinaryAtomicIntNoRet;
+
+
+/// Intrinsics for atomic instructions with a return value
+/// Signed 32 bit integer atomics for global address space
+def int_AMDIL_atomic_add_gi32 : GCCBuiltin<"__atomic_add_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_sub_gi32 : GCCBuiltin<"__atomic_sub_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_rsub_gi32 : GCCBuiltin<"__atomic_rsub_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_gi32 : GCCBuiltin<"__atomic_xchg_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_inc_gi32 : GCCBuiltin<"__atomic_inc_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_dec_gi32 : GCCBuiltin<"__atomic_dec_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_cmpxchg_gi32 : GCCBuiltin<"__atomic_cmpxchg_gi32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_min_gi32 : GCCBuiltin<"__atomic_min_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_max_gi32 : GCCBuiltin<"__atomic_max_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_and_gi32 : GCCBuiltin<"__atomic_and_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_or_gi32 : GCCBuiltin<"__atomic_or_gi32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xor_gi32 : GCCBuiltin<"__atomic_xor_gi32">,
+    BinaryAtomicInt;
+
+/// 32 bit float atomics required by OpenCL
+def int_AMDIL_atomic_xchg_gf32 : GCCBuiltin<"__atomic_xchg_gf32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_gf32_noret : GCCBuiltin<"__atomic_xchg_gf32_noret">,
+    BinaryAtomicIntNoRet;
+
+/// Unsigned 32 bit integer atomics for global address space
+def int_AMDIL_atomic_add_gu32 : GCCBuiltin<"__atomic_add_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_sub_gu32 : GCCBuiltin<"__atomic_sub_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_rsub_gu32 : GCCBuiltin<"__atomic_rsub_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_gu32 : GCCBuiltin<"__atomic_xchg_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_inc_gu32 : GCCBuiltin<"__atomic_inc_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_dec_gu32 : GCCBuiltin<"__atomic_dec_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_cmpxchg_gu32 : GCCBuiltin<"__atomic_cmpxchg_gu32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_min_gu32 : GCCBuiltin<"__atomic_min_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_max_gu32 : GCCBuiltin<"__atomic_max_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_and_gu32 : GCCBuiltin<"__atomic_and_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_or_gu32 : GCCBuiltin<"__atomic_or_gu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xor_gu32 : GCCBuiltin<"__atomic_xor_gu32">,
+    BinaryAtomicInt;
+
+
+/// Intrinsics for atomic instructions with no return value
+/// Signed 32 bit integer atomics for local address space
+def int_AMDIL_atomic_add_li32_noret : GCCBuiltin<"__atomic_add_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_sub_li32_noret : GCCBuiltin<"__atomic_sub_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_rsub_li32_noret : GCCBuiltin<"__atomic_rsub_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_li32_noret : GCCBuiltin<"__atomic_xchg_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_inc_li32_noret : GCCBuiltin<"__atomic_inc_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_dec_li32_noret : GCCBuiltin<"__atomic_dec_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_cmpxchg_li32_noret : GCCBuiltin<"__atomic_cmpxchg_li32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_min_li32_noret : GCCBuiltin<"__atomic_min_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_max_li32_noret : GCCBuiltin<"__atomic_max_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_and_li32_noret : GCCBuiltin<"__atomic_and_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_or_li32_noret : GCCBuiltin<"__atomic_or_li32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_mskor_li32_noret : GCCBuiltin<"__atomic_mskor_li32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_xor_li32_noret : GCCBuiltin<"__atomic_xor_li32_noret">,
+    BinaryAtomicIntNoRet;
+
+/// Signed 32 bit integer atomics for region address space
+def int_AMDIL_atomic_add_ri32_noret : GCCBuiltin<"__atomic_add_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_sub_ri32_noret : GCCBuiltin<"__atomic_sub_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_rsub_ri32_noret : GCCBuiltin<"__atomic_rsub_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_ri32_noret : GCCBuiltin<"__atomic_xchg_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_inc_ri32_noret : GCCBuiltin<"__atomic_inc_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_dec_ri32_noret : GCCBuiltin<"__atomic_dec_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_cmpxchg_ri32_noret : GCCBuiltin<"__atomic_cmpxchg_ri32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_min_ri32_noret : GCCBuiltin<"__atomic_min_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_max_ri32_noret : GCCBuiltin<"__atomic_max_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_and_ri32_noret : GCCBuiltin<"__atomic_and_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_or_ri32_noret : GCCBuiltin<"__atomic_or_ri32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_mskor_ri32_noret : GCCBuiltin<"__atomic_mskor_ri32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_xor_ri32_noret : GCCBuiltin<"__atomic_xor_ri32_noret">,
+    BinaryAtomicIntNoRet;
+
+
+
+/// Unsigned 32 bit integer atomics for local address space
+def int_AMDIL_atomic_add_lu32_noret : GCCBuiltin<"__atomic_add_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_sub_lu32_noret : GCCBuiltin<"__atomic_sub_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_rsub_lu32_noret : GCCBuiltin<"__atomic_rsub_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_lu32_noret : GCCBuiltin<"__atomic_xchg_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_inc_lu32_noret : GCCBuiltin<"__atomic_inc_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_dec_lu32_noret : GCCBuiltin<"__atomic_dec_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_cmpxchg_lu32_noret : GCCBuiltin<"__atomic_cmpxchg_lu32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_min_lu32_noret : GCCBuiltin<"__atomic_min_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_max_lu32_noret : GCCBuiltin<"__atomic_max_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_and_lu32_noret : GCCBuiltin<"__atomic_and_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_or_lu32_noret : GCCBuiltin<"__atomic_or_lu32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_mskor_lu32_noret : GCCBuiltin<"__atomic_mskor_lu32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_xor_lu32_noret : GCCBuiltin<"__atomic_xor_lu32_noret">,
+    BinaryAtomicIntNoRet;
+
+/// Unsigned 32 bit integer atomics for region address space
+def int_AMDIL_atomic_add_ru32_noret : GCCBuiltin<"__atomic_add_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_sub_ru32_noret : GCCBuiltin<"__atomic_sub_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_rsub_ru32_noret : GCCBuiltin<"__atomic_rsub_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_ru32_noret : GCCBuiltin<"__atomic_xchg_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_inc_ru32_noret : GCCBuiltin<"__atomic_inc_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_dec_ru32_noret : GCCBuiltin<"__atomic_dec_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_cmpxchg_ru32_noret : GCCBuiltin<"__atomic_cmpxchg_ru32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_min_ru32_noret : GCCBuiltin<"__atomic_min_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_max_ru32_noret : GCCBuiltin<"__atomic_max_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_and_ru32_noret : GCCBuiltin<"__atomic_and_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_or_ru32_noret : GCCBuiltin<"__atomic_or_ru32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_mskor_ru32_noret : GCCBuiltin<"__atomic_mskor_ru32_noret">,
+    TernaryAtomicIntNoRet;
+def int_AMDIL_atomic_xor_ru32_noret : GCCBuiltin<"__atomic_xor_ru32_noret">,
+    BinaryAtomicIntNoRet;
+
+def int_AMDIL_get_cycle_count : GCCBuiltin<"__amdil_get_cycle_count">,
+    VoidIntLong;
+
+def int_AMDIL_compute_unit_id : GCCBuiltin<"__amdil_compute_unit_id">,
+    VoidIntInt;
+
+def int_AMDIL_wavefront_id : GCCBuiltin<"__amdil_wavefront_id">,
+    VoidIntInt;
+
+
+/// Intrinsics for atomic instructions with a return value
+/// Signed 32 bit integer atomics for local address space
+def int_AMDIL_atomic_add_li32 : GCCBuiltin<"__atomic_add_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_sub_li32 : GCCBuiltin<"__atomic_sub_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_rsub_li32 : GCCBuiltin<"__atomic_rsub_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_li32 : GCCBuiltin<"__atomic_xchg_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_inc_li32 : GCCBuiltin<"__atomic_inc_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_dec_li32 : GCCBuiltin<"__atomic_dec_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_cmpxchg_li32 : GCCBuiltin<"__atomic_cmpxchg_li32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_min_li32 : GCCBuiltin<"__atomic_min_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_max_li32 : GCCBuiltin<"__atomic_max_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_and_li32 : GCCBuiltin<"__atomic_and_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_or_li32 : GCCBuiltin<"__atomic_or_li32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_mskor_li32 : GCCBuiltin<"__atomic_mskor_li32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_xor_li32 : GCCBuiltin<"__atomic_xor_li32">,
+    BinaryAtomicInt;
+
+/// Signed 32 bit integer atomics for region address space
+def int_AMDIL_atomic_add_ri32 : GCCBuiltin<"__atomic_add_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_sub_ri32 : GCCBuiltin<"__atomic_sub_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_rsub_ri32 : GCCBuiltin<"__atomic_rsub_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_ri32 : GCCBuiltin<"__atomic_xchg_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_inc_ri32 : GCCBuiltin<"__atomic_inc_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_dec_ri32 : GCCBuiltin<"__atomic_dec_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_cmpxchg_ri32 : GCCBuiltin<"__atomic_cmpxchg_ri32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_min_ri32 : GCCBuiltin<"__atomic_min_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_max_ri32 : GCCBuiltin<"__atomic_max_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_and_ri32 : GCCBuiltin<"__atomic_and_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_or_ri32 : GCCBuiltin<"__atomic_or_ri32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_mskor_ri32 : GCCBuiltin<"__atomic_mskor_ri32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_xor_ri32 : GCCBuiltin<"__atomic_xor_ri32">,
+    BinaryAtomicInt;
+
+/// 32 bit float atomics required by OpenCL
+def int_AMDIL_atomic_xchg_lf32 : GCCBuiltin<"__atomic_xchg_lf32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_lf32_noret : GCCBuiltin<"__atomic_xchg_lf32_noret">,
+    BinaryAtomicIntNoRet;
+def int_AMDIL_atomic_xchg_rf32 : GCCBuiltin<"__atomic_xchg_rf32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_rf32_noret : GCCBuiltin<"__atomic_xchg_rf32_noret">,
+    BinaryAtomicIntNoRet;
+
+/// Unsigned 32 bit integer atomics for local address space
+def int_AMDIL_atomic_add_lu32 : GCCBuiltin<"__atomic_add_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_sub_lu32 : GCCBuiltin<"__atomic_sub_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_rsub_lu32 : GCCBuiltin<"__atomic_rsub_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_lu32 : GCCBuiltin<"__atomic_xchg_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_inc_lu32 : GCCBuiltin<"__atomic_inc_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_dec_lu32 : GCCBuiltin<"__atomic_dec_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_cmpxchg_lu32 : GCCBuiltin<"__atomic_cmpxchg_lu32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_min_lu32 : GCCBuiltin<"__atomic_min_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_max_lu32 : GCCBuiltin<"__atomic_max_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_and_lu32 : GCCBuiltin<"__atomic_and_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_or_lu32 : GCCBuiltin<"__atomic_or_lu32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_mskor_lu32 : GCCBuiltin<"__atomic_mskor_lu32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_xor_lu32 : GCCBuiltin<"__atomic_xor_lu32">,
+    BinaryAtomicInt;
+
+/// Unsigned 32 bit integer atomics for region address space
+def int_AMDIL_atomic_add_ru32 : GCCBuiltin<"__atomic_add_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_sub_ru32 : GCCBuiltin<"__atomic_sub_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_rsub_ru32 : GCCBuiltin<"__atomic_rsub_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_xchg_ru32 : GCCBuiltin<"__atomic_xchg_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_inc_ru32 : GCCBuiltin<"__atomic_inc_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_dec_ru32 : GCCBuiltin<"__atomic_dec_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_cmpxchg_ru32 : GCCBuiltin<"__atomic_cmpxchg_ru32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_min_ru32 : GCCBuiltin<"__atomic_min_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_max_ru32 : GCCBuiltin<"__atomic_max_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_and_ru32 : GCCBuiltin<"__atomic_and_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_or_ru32 : GCCBuiltin<"__atomic_or_ru32">,
+    BinaryAtomicInt;
+def int_AMDIL_atomic_mskor_ru32 : GCCBuiltin<"__atomic_mskor_ru32">,
+    TernaryAtomicInt;
+def int_AMDIL_atomic_xor_ru32 : GCCBuiltin<"__atomic_xor_ru32">,
+    BinaryAtomicInt;
+
+/// Semaphore signal/wait/init
+def int_AMDIL_semaphore_init : GCCBuiltin<"__amdil_semaphore_init">,
+    Intrinsic<[], [llvm_ptr_ty, llvm_i32_ty]>;
+def int_AMDIL_semaphore_wait : GCCBuiltin<"__amdil_semaphore_wait">,
+    Intrinsic<[], [llvm_ptr_ty]>;
+def int_AMDIL_semaphore_signal : GCCBuiltin<"__amdil_semaphore_signal">,
+    Intrinsic<[], [llvm_ptr_ty]>;
+def int_AMDIL_semaphore_size   : GCCBuiltin<"__amdil_max_semaphore_size">,
+    Intrinsic<[llvm_i32_ty], []>;
+}
--- a/src/gallium/drivers/radeon/AMDILKernel.h
+++ b/src/gallium/drivers/radeon/AMDILKernel.h
@ -0,0 +1,84 @@
+//===------------- AMDILKernel.h - AMDIL Kernel Class ----------*- C++ -*--===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// Definition of a AMDILKernel object and the various subclasses that 
+// are used.
+//===----------------------------------------------------------------------===//
+#ifndef _AMDIL_KERNEL_H_
+#define _AMDIL_KERNEL_H_
+#include "AMDIL.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/Constant.h"
+#include "llvm/Value.h"
+
+namespace llvm {
+  class AMDILSubtarget;
+  class AMDILTargetMachine;
+  /// structure that holds information for a single local/region address array
+  typedef struct _AMDILArrayMemRec {
+    uint32_t vecSize; // size of each vector
+    uint32_t offset;  // offset into the memory section
+    bool isHW;        // flag to specify if HW is used or SW is used
+    bool isRegion;    // flag to specify if GDS is used or not
+  } AMDILArrayMem;
+
+  /// structure that holds information about a constant address
+  /// space pointer that is a kernel argument
+  typedef struct _AMDILConstPtrRec {
+    const llvm::Value *base;
+    uint32_t size;
+    uint32_t offset;
+    uint32_t cbNum; // value of 0 means that it does not use hw CB
+    bool isArray;
+    bool isArgument;
+    bool usesHardware;
+    std::string name;
+  } AMDILConstPtr;
+ 
+  /// Structure that holds information for all local/region address
+  /// arrays in the kernel
+  typedef struct _AMDILLocalArgRec {
+    llvm::SmallVector<AMDILArrayMem *, DEFAULT_VEC_SLOTS> local;
+    std::string name; // Kernel Name
+  } AMDILLocalArg;
+
+  /// Structure that holds information for each kernel argument
+  typedef struct _AMDILkernelArgRec {
+    uint32_t reqGroupSize[3];
+    uint32_t reqRegionSize[3];
+    llvm::SmallVector<uint32_t, DEFAULT_VEC_SLOTS> argInfo;
+    bool mHasRWG;
+    bool mHasRWR;
+  } AMDILKernelAttr;
+
+  /// Structure that holds information for each kernel
+  class AMDILKernel {
+    public:
+      AMDILKernel() {}
+      uint32_t curSize;
+      uint32_t curRSize;
+      uint32_t curHWSize;
+      uint32_t curHWRSize;
+      uint32_t constSize;
+      bool mKernel;
+      std::string mName;
+      AMDILKernelAttr *sgv;
+      AMDILLocalArg *lvgv;
+      llvm::SmallVector<struct _AMDILConstPtrRec, DEFAULT_VEC_SLOTS> constPtr;
+      uint32_t constSizes[HW_MAX_NUM_CB];
+      llvm::SmallSet<uint32_t, OPENCL_MAX_READ_IMAGES> readOnly;
+      llvm::SmallSet<uint32_t, OPENCL_MAX_WRITE_IMAGES> writeOnly;
+      llvm::SmallVector<std::pair<uint32_t, const llvm::Constant *>,
+        DEFAULT_VEC_SLOTS> CPOffsets;
+      typedef llvm::SmallVector<struct _AMDILConstPtrRec, DEFAULT_VEC_SLOTS>::iterator constptr_iterator;
+      typedef llvm::SmallVector<AMDILArrayMem *, DEFAULT_VEC_SLOTS>::iterator arraymem_iterator;
+  }; // AMDILKernel
+} // end llvm namespace
+#endif // _AMDIL_KERNEL_H_
--- a/src/gallium/drivers/radeon/AMDILKernelManager.cpp
+++ b/src/gallium/drivers/radeon/AMDILKernelManager.cpp
--- a/src/gallium/drivers/radeon/AMDILKernelManager.h
+++ b/src/gallium/drivers/radeon/AMDILKernelManager.h
@ -0,0 +1,177 @@
+//===-- AMDILKernelManager.h - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// 
+// Class that handles the metadata/abi management for the
+// ASM printer. Handles the parsing and generation of the metadata
+// for each kernel and keeps track of its arguments.
+//
+//==-----------------------------------------------------------------------===//
+#ifndef _AMDILKERNELMANAGER_H_
+#define _AMDILKERNELMANAGER_H_
+#include "AMDIL.h"
+#include "AMDILDevice.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/ValueMap.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/Function.h"
+
+#include <map>
+#include <set>
+#include <string>
+
+#define IMAGETYPE_2D 0
+#define IMAGETYPE_3D 1
+#define RESERVED_LIT_COUNT 6
+
+namespace llvm {
+class AMDILGlobalManager;
+class AMDILSubtarget;
+class AMDILMachineFunctionInfo;
+class AMDILTargetMachine;
+class AMDILAsmPrinter;
+class StructType;
+class Value;
+class TypeSymbolTable;
+class MachineFunction;
+class MachineInstr;
+class ConstantFP;
+class PrintfInfo;
+
+
+class AMDILKernelManager {
+public:
+  typedef enum {
+    RELEASE_ONLY,
+    DEBUG_ONLY,
+    ALWAYS
+  } ErrorMsgEnum;
+  AMDILKernelManager(AMDILTargetMachine *TM, AMDILGlobalManager *GM);
+  virtual ~AMDILKernelManager();
+  
+  /// Clear the state of the KernelManager putting it in its most initial state.
+  void clear();
+  void setMF(MachineFunction *MF);
+
+  /// Process the specific kernel parsing out the parameter information for the
+  /// kernel.
+  void processArgMetadata(llvm::raw_ostream &O,
+                          uint32_t buf, bool kernel);
+
+
+  /// Prints the header for the kernel which includes the groupsize declaration
+  /// and calculation of the local/group/global id's.
+  void printHeader(AMDILAsmPrinter *AsmPrinter, llvm::raw_ostream &O,
+                   const std::string &name);
+
+  virtual void printDecls(AMDILAsmPrinter *AsmPrinter, llvm::raw_ostream &O);
+  virtual void printGroupSize(llvm::raw_ostream &O);
+
+  /// Copies the data from the runtime setup constant buffers into registers so
+  /// that the program can correctly access memory or data that was set by the
+  /// host program.
+  void printArgCopies(llvm::raw_ostream &O, AMDILAsmPrinter* RegNames);
+
+  /// Prints out the end of the function.
+  void printFooter(llvm::raw_ostream &O);
+  
+  /// Prints out the metadata for the specific function depending if it is a
+  /// kernel or not.
+  void printMetaData(llvm::raw_ostream &O, uint32_t id, bool isKernel = false);
+  
+  /// Set bool value on whether to consider the function a kernel or a normal
+  /// function.
+  void setKernel(bool kernel);
+
+  /// Set the unique ID of the kernel/function.
+  void setID(uint32_t id);
+
+  /// Set the name of the kernel/function.
+  void setName(const std::string &name);
+
+  /// Flag to specify whether the function is a kernel or not.
+  bool isKernel();
+
+  /// Flag that specifies whether this function has a kernel wrapper.
+  bool wasKernel();
+
+  void getIntrinsicSetup(AMDILAsmPrinter *AsmPrinter, llvm::raw_ostream &O); 
+
+  // Returns whether a compiler needs to insert a write to memory or not.
+  bool useCompilerWrite(const MachineInstr *MI);
+
+  // Set the flag that there exists an image write.
+  void setImageWrite();
+  void setOutputInst();
+
+  const char *getTypeName(const Type *name, const char * symTab);
+
+  void emitLiterals(llvm::raw_ostream &O);
+
+  // Set the uav id for the specific pointer value.  If value is NULL, then the
+  // ID sets the default ID.
+  void setUAVID(const Value *value, uint32_t ID);
+
+  // Get the UAV id for the specific pointer value.
+  uint32_t getUAVID(const Value *value);
+
+private:
+
+  /// Helper function that prints the actual metadata and should only be called
+  /// by printMetaData.
+  void printKernelArgs(llvm::raw_ostream &O);
+  void printCopyStructPrivate(const StructType *ST,
+                              llvm::raw_ostream &O,
+                              size_t stackSize,
+                              uint32_t Buffer,
+                              uint32_t mLitIdx,
+                              uint32_t &counter);
+  virtual void
+  printConstantToRegMapping(AMDILAsmPrinter *RegNames,
+                            uint32_t &LII,
+                            llvm::raw_ostream &O,
+                            uint32_t &counter,
+                            uint32_t Buffer,
+                            uint32_t n,
+                            const char *lit = NULL,
+                            uint32_t fcall = 0,
+                            bool isImage = false,
+                            bool isHWCB = false);
+  void updatePtrArg(llvm::Function::const_arg_iterator Ip,
+                    int numWriteImages,
+                    int raw_uav_buffer,
+                    int counter,
+                    bool isKernel,
+                    const Function *F);
+  /// Name of the current kernel.
+  std::string mName;
+  uint32_t mUniqueID;
+  bool mIsKernel;
+  bool mWasKernel;
+  bool mCompilerWrite;
+  /// Flag to specify if an image write has occured or not in order to not add a
+  /// compiler specific write if no other writes to memory occured.
+  bool mHasImageWrite;
+  bool mHasOutputInst;
+  
+  /// Map from const Value * to UAV ID.
+  std::map<const Value *, uint32_t> mValueIDMap;
+
+  AMDILTargetMachine * mTM;
+  const AMDILSubtarget * mSTM;
+  AMDILGlobalManager * mGM;
+  /// This is the global offset of the printf string id's.
+  MachineFunction *mMF;
+  AMDILMachineFunctionInfo *mMFI;
+}; // class AMDILKernelManager
+
+} // llvm namespace
+#endif // _AMDILKERNELMANAGER_H_
--- a/src/gallium/drivers/radeon/AMDILLiteralManager.cpp
+++ b/src/gallium/drivers/radeon/AMDILLiteralManager.cpp
@ -0,0 +1,128 @@
+//===--- AMDILLiteralManager.cpp - AMDIL Literal Manager Pass --*- C++ -*--===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+#define DEBUG_TYPE "literal_manager"
+
+#include "AMDIL.h"
+
+#include "AMDILAlgorithms.tpp"
+#include "AMDILKernelManager.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILSubtarget.h"
+#include "AMDILTargetMachine.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+
+
+// AMDIL Literal Manager traverses through all of the LOADCONST instructions and
+// converts them from an immediate value to the literal index. The literal index
+// is valid IL, but the immediate values are not. The Immediate values must be
+// aggregated and declared for clarity and to reduce the number of literals that
+// are used. It is also illegal to declare the same literal twice, so this keeps
+// that from occuring.
+
+namespace {
+  class AMDILLiteralManager : public MachineFunctionPass {
+  public:
+    static char ID;
+    AMDILLiteralManager(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+    virtual const char *getPassName() const;
+
+    bool runOnMachineFunction(MachineFunction &MF);
+  private:
+    bool trackLiterals(MachineBasicBlock::iterator *bbb);
+    TargetMachine &TM;
+    const AMDILSubtarget *mSTM;
+    AMDILKernelManager *mKM;
+    AMDILMachineFunctionInfo *mMFI;
+    int32_t mLitIdx;
+    bool mChanged;
+  };
+  char AMDILLiteralManager::ID = 0;
+}
+
+namespace llvm {
+  FunctionPass *
+  createAMDILLiteralManager(TargetMachine &tm AMDIL_OPT_LEVEL_DECL) {
+    return new AMDILLiteralManager(tm AMDIL_OPT_LEVEL_VAR);
+  }
+  
+}
+
+AMDILLiteralManager::AMDILLiteralManager(TargetMachine &tm
+                                         AMDIL_OPT_LEVEL_DECL)
+  : MachineFunctionPass(ID),
+    TM(tm) {
+}
+
+bool AMDILLiteralManager::runOnMachineFunction(MachineFunction &MF) {
+  mChanged = false;
+  mMFI = MF.getInfo<AMDILMachineFunctionInfo>();
+  const AMDILTargetMachine *amdtm =
+    reinterpret_cast<const AMDILTargetMachine *>(&TM);
+  mSTM = dynamic_cast<const AMDILSubtarget *>(amdtm->getSubtargetImpl());
+  mKM = const_cast<AMDILKernelManager *>(mSTM->getKernelManager());
+  safeNestedForEach(MF.begin(), MF.end(), MF.begin()->begin(),
+      std::bind1st(std::mem_fun(&AMDILLiteralManager::trackLiterals), this));
+  return mChanged;
+}
+
+bool AMDILLiteralManager::trackLiterals(MachineBasicBlock::iterator *bbb) {
+  MachineInstr *MI = *bbb;
+  uint32_t Opcode = MI->getOpcode();
+  switch(Opcode) {
+  default:
+    return false;
+  case AMDIL::LOADCONST_i8:
+  case AMDIL::LOADCONST_i16:
+  case AMDIL::LOADCONST_i32:
+  case AMDIL::LOADCONST_i64:
+  case AMDIL::LOADCONST_f32:
+  case AMDIL::LOADCONST_f64:
+    break;
+  };
+  MachineOperand &dstOp = MI->getOperand(0);
+  MachineOperand &litOp = MI->getOperand(1);
+  if (!litOp.isImm() && !litOp.isFPImm()) {
+    return false;
+  }
+  if (!dstOp.isReg()) {
+    return false;
+  }
+  // Change the literal to the correct index for each literal that is found.
+  if (litOp.isImm()) {
+    int64_t immVal = litOp.getImm();
+    uint32_t idx = MI->getOpcode() == AMDIL::LOADCONST_i64 
+                     ? mMFI->addi64Literal(immVal)
+                     : mMFI->addi32Literal(static_cast<int>(immVal), Opcode);
+    litOp.ChangeToImmediate(idx);
+    return false;
+  } 
+
+  if (litOp.isFPImm()) {
+    const ConstantFP *fpVal = litOp.getFPImm();
+    uint32_t idx = MI->getOpcode() == AMDIL::LOADCONST_f64
+                     ? mMFI->addf64Literal(fpVal)
+                     : mMFI->addf32Literal(fpVal);
+    litOp.ChangeToImmediate(idx);
+    return false;
+  }
+
+  return false;
+}
+
+const char* AMDILLiteralManager::getPassName() const {
+    return "AMDIL Constant Propagation";
+}
+
+
--- a/src/gallium/drivers/radeon/AMDILMCCodeEmitter.cpp
+++ b/src/gallium/drivers/radeon/AMDILMCCodeEmitter.cpp
@ -0,0 +1,158 @@
+//===---- AMDILMCCodeEmitter.cpp - Convert AMDIL text to AMDIL binary ----===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+//===---------------------------------------------------------------------===//
+
+#define DEBUG_TYPE "amdil-emitter"
+#include "AMDIL.h"
+#include "AMDILInstrInfo.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/MC/MCCodeEmitter.h"
+#include "llvm/MC/MCExpr.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace llvm;
+#if 0
+namespace {
+  class AMDILMCCodeEmitter : public MCCodeEmitter {
+    AMDILMCCodeEmitter(const AMDILMCCodeEmitter &);// DO NOT IMPLEMENT
+    void operator=(const AMDILMCCodeEmitter &); // DO NOT IMPLEMENT
+    const TargetMachine &TM;
+    const TargetInstrInfo &TII;
+    MCContext &Ctx;
+    bool Is64BitMode;
+    public:
+    AMDILMCCodeEmitter(TargetMachine &tm, MCContext &ctx, bool is64Bit);
+    ~AMDILMCCodeEmitter();
+    unsigned getNumFixupKinds() const;
+    const MCFixupKindInfo& getFixupKindInfo(MCFixupKind Kind) const;
+    static unsigned GetAMDILRegNum(const MCOperand &MO);
+    void EmitByte(unsigned char C, unsigned &CurByte, raw_ostream &OS) const;
+    void EmitConstant(uint64_t Val, unsigned Size, unsigned &CurByte,
+        raw_ostream &OS) const;
+    void EmitImmediate(const MCOperand &Disp, unsigned ImmSize,
+        MCFixupKind FixupKind, unsigned &CurByte, raw_ostream &os,
+        SmallVectorImpl<MCFixup> &Fixups, int ImmOffset = 0) const;
+
+    void EncodeInstruction(const MCInst &MI, raw_ostream &OS,
+        SmallVectorImpl<MCFixup> &Fixups) const;
+
+  }; // class AMDILMCCodeEmitter
+}; // anonymous namespace
+
+namespace llvm {
+  MCCodeEmitter *createAMDILMCCodeEmitter(const Target &,
+      TargetMachine &TM, MCContext &Ctx)
+  {
+    return new AMDILMCCodeEmitter(TM, Ctx, false);
+  }
+}
+
+AMDILMCCodeEmitter::AMDILMCCodeEmitter(TargetMachine &tm, MCContext &ctx
+    , bool is64Bit)
+: TM(tm), TII(*TM.getInstrInfo()), Ctx(ctx)
+{
+  Is64BitMode = is64Bit;
+}
+
+AMDILMCCodeEmitter::~AMDILMCCodeEmitter()
+{
+}
+
+unsigned
+AMDILMCCodeEmitter::getNumFixupKinds() const
+{
+  return 0;
+}
+
+const MCFixupKindInfo &
+AMDILMCCodeEmitter::getFixupKindInfo(MCFixupKind Kind) const
+{
+//  const static MCFixupKindInfo Infos[] = {};
+  if (Kind < FirstTargetFixupKind) {
+    return MCCodeEmitter::getFixupKindInfo(Kind);
+  }
+  assert(unsigned(Kind - FirstTargetFixupKind) < getNumFixupKinds() &&
+      "Invalid kind!");
+  return MCCodeEmitter::getFixupKindInfo(Kind);
+ // return Infos[Kind - FirstTargetFixupKind];
+
+}
+
+void
+AMDILMCCodeEmitter::EmitByte(unsigned char C, unsigned &CurByte,
+    raw_ostream &OS) const
+{
+  OS << (char) C;
+  ++CurByte;
+}
+void
+AMDILMCCodeEmitter::EmitConstant(uint64_t Val, unsigned Size, unsigned &CurByte,
+    raw_ostream &OS) const
+{
+  // Output the constant in little endian byte order
+  for (unsigned i = 0; i != Size; ++i) {
+    EmitByte(Val & 255, CurByte, OS);
+    Val >>= 8;
+  }
+}
+void
+AMDILMCCodeEmitter::EmitImmediate(const MCOperand &DispOp, unsigned ImmSize,
+    MCFixupKind FixupKind, unsigned &CurByte, raw_ostream &OS,
+    SmallVectorImpl<MCFixup> &Fixups, int ImmOffset) const
+{
+  // If this is a simple integer displacement that doesn't require a relocation
+  // emit it now.
+  if (DispOp.isImm()) {
+    EmitConstant(DispOp.getImm() + ImmOffset, ImmSize, CurByte, OS);
+  }
+
+  // If we have an immoffset, add it to the expression
+  const MCExpr *Expr = DispOp.getExpr();
+
+  if (ImmOffset) {
+    Expr = MCBinaryExpr::CreateAdd(Expr,
+        MCConstantExpr::Create(ImmOffset, Ctx), Ctx);
+  }
+  // Emit a symbolic constant as a fixup and 4 zeros.
+  Fixups.push_back(MCFixup::Create(CurByte, Expr, FixupKind));
+  // TODO: Why the 4 zeros?
+  EmitConstant(0, ImmSize, CurByte, OS);
+}
+
+void
+AMDILMCCodeEmitter::EncodeInstruction(const MCInst &MI, raw_ostream &OS,
+    SmallVectorImpl<MCFixup> &Fixups) const
+{
+#if 0
+  unsigned Opcode = MI.getOpcode();
+  const TargetInstrDesc &Desc = TII.get(Opcode);
+  unsigned TSFlags = Desc.TSFlags;
+
+  // Keep track of the current byte being emitted.
+  unsigned CurByte = 0;
+
+  unsigned NumOps = Desc.getNumOperands();
+  unsigned CurOp = 0;
+
+  unsigned char BaseOpcode = 0;
+#ifndef NDEBUG
+  // FIXME: Verify.
+  if (// !Desc.isVariadic() &&
+      CurOp != NumOps) {
+    errs() << "Cannot encode all operands of: ";
+    MI.dump();
+    errs() << '\n';
+    abort();
+  }
+#endif
+#endif
+}
+#endif
--- a/src/gallium/drivers/radeon/AMDILMachineFunctionInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDILMachineFunctionInfo.cpp
@ -0,0 +1,597 @@
+//===-- AMDILMachineFunctionInfo.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILCompilerErrors.h"
+#include "AMDILModuleInfo.h"
+#include "AMDILSubtarget.h"
+#include "AMDILTargetMachine.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/Constants.h"
+#include "llvm/DerivedTypes.h"
+#include "llvm/Function.h"
+#include "llvm/Instructions.h"
+#include "llvm/Support/FormattedStream.h"
+
+using namespace llvm;
+
+static const AMDILConstPtr *getConstPtr(const AMDILKernel *krnl, const std::string &arg) {
+  llvm::SmallVector<AMDILConstPtr, DEFAULT_VEC_SLOTS>::const_iterator begin, end;
+  for (begin = krnl->constPtr.begin(), end = krnl->constPtr.end();
+       begin != end; ++begin) {
+    if (!strcmp(begin->name.data(),arg.c_str())) {
+      return &(*begin);
+    }
+  }
+  return NULL;
+}
+
+void PrintfInfo::addOperand(size_t idx, uint32_t size) {
+  mOperands.resize((unsigned)(idx + 1));
+  mOperands[(unsigned)idx] = size;
+}
+
+uint32_t PrintfInfo::getPrintfID() {
+  return mPrintfID;
+}
+
+void PrintfInfo::setPrintfID(uint32_t id) {
+  mPrintfID = id;
+}
+
+size_t PrintfInfo::getNumOperands() {
+  return mOperands.size();
+}
+
+uint32_t PrintfInfo::getOperandID(uint32_t idx) {
+  return mOperands[idx];
+}
+
+AMDILMachineFunctionInfo::AMDILMachineFunctionInfo()
+  : CalleeSavedFrameSize(0), BytesToPopOnReturn(0),
+  DecorationStyle(None), ReturnAddrIndex(0),
+  TailCallReturnAddrDelta(0),
+  SRetReturnReg(0), UsesLDS(false), LDSArg(false),
+  UsesGDS(false), GDSArg(false),
+  mReservedLits(9)
+{
+  for (uint32_t x = 0; x < AMDILDevice::MAX_IDS; ++x) {
+    mUsedMem[x] = false;
+  }
+  mMF = NULL;
+  mKernel = NULL;
+  mScratchSize = -1;
+  mArgSize = -1;
+  mStackSize = -1;
+}
+
+AMDILMachineFunctionInfo::AMDILMachineFunctionInfo(MachineFunction& MF)
+  : CalleeSavedFrameSize(0), BytesToPopOnReturn(0),
+  DecorationStyle(None), ReturnAddrIndex(0),
+  TailCallReturnAddrDelta(0),
+  SRetReturnReg(0), UsesLDS(false), LDSArg(false),
+  UsesGDS(false), GDSArg(false),
+  mReservedLits(9)
+{
+  for (uint32_t x = 0; x < AMDILDevice::MAX_IDS; ++x) {
+    mUsedMem[x] = false;
+  }
+  const Function *F = MF.getFunction();
+  mMF = &MF;
+  MachineModuleInfo &mmi = MF.getMMI();
+  const AMDILTargetMachine *TM = 
+      reinterpret_cast<const AMDILTargetMachine*>(&MF.getTarget());
+  AMDILModuleInfo *AMI = &(mmi.getObjFileInfo<AMDILModuleInfo>());
+  AMI->processModule(mmi.getModule(), TM);
+  mSTM = TM->getSubtargetImpl();
+  mKernel = AMI->getKernel(F->getName());
+
+  mScratchSize = -1;
+  mArgSize = -1;
+  mStackSize = -1;
+}
+
+AMDILMachineFunctionInfo::~AMDILMachineFunctionInfo()
+{
+  for (std::map<std::string, PrintfInfo*>::iterator pfb = printf_begin(),
+      pfe = printf_end(); pfb != pfe; ++pfb) {
+    delete pfb->second;
+  }
+}
+unsigned int
+AMDILMachineFunctionInfo::getCalleeSavedFrameSize() const
+{
+  return CalleeSavedFrameSize;
+}
+void
+AMDILMachineFunctionInfo::setCalleeSavedFrameSize(unsigned int bytes)
+{
+  CalleeSavedFrameSize = bytes;
+}
+unsigned int
+AMDILMachineFunctionInfo::getBytesToPopOnReturn() const
+{
+  return BytesToPopOnReturn;
+}
+void
+AMDILMachineFunctionInfo::setBytesToPopOnReturn(unsigned int bytes)
+{
+  BytesToPopOnReturn = bytes;
+}
+NameDecorationStyle
+AMDILMachineFunctionInfo::getDecorationStyle() const
+{
+  return DecorationStyle;
+}
+void
+AMDILMachineFunctionInfo::setDecorationStyle(NameDecorationStyle style)
+{
+  DecorationStyle = style;
+}
+int
+AMDILMachineFunctionInfo::getRAIndex() const
+{
+  return ReturnAddrIndex;
+}
+void
+AMDILMachineFunctionInfo::setRAIndex(int index)
+{
+  ReturnAddrIndex = index;
+}
+int
+AMDILMachineFunctionInfo::getTCReturnAddrDelta() const
+{
+  return TailCallReturnAddrDelta;
+}
+void
+AMDILMachineFunctionInfo::setTCReturnAddrDelta(int delta)
+{
+  TailCallReturnAddrDelta = delta;
+}
+unsigned int
+AMDILMachineFunctionInfo::getSRetReturnReg() const
+{
+  return SRetReturnReg;
+}
+void
+AMDILMachineFunctionInfo::setSRetReturnReg(unsigned int reg)
+{
+  SRetReturnReg = reg;
+}
+
+void 
+AMDILMachineFunctionInfo::setUsesLocal()
+{
+  UsesLDS = true;
+}
+
+bool
+AMDILMachineFunctionInfo::usesLocal() const
+{
+  return UsesLDS;
+}
+
+void 
+AMDILMachineFunctionInfo::setHasLocalArg()
+{
+  LDSArg = true;
+}
+
+bool
+AMDILMachineFunctionInfo::hasLocalArg() const
+{
+  return LDSArg;
+}
+
+
+
+void
+AMDILMachineFunctionInfo::setUsesRegion()
+{
+  UsesGDS = true;
+}
+
+bool
+AMDILMachineFunctionInfo::usesRegion() const
+{
+  return UsesGDS;
+}
+
+void 
+AMDILMachineFunctionInfo::setHasRegionArg()
+{
+  GDSArg = true;
+}
+
+bool
+AMDILMachineFunctionInfo::hasRegionArg() const
+{
+  return GDSArg;
+}
+
+
+bool
+AMDILMachineFunctionInfo::usesHWConstant(std::string name) const
+{
+  const AMDILConstPtr *curConst = getConstPtr(mKernel, name);
+  if (curConst) {
+    return curConst->usesHardware;
+  } else {
+    return false;
+  }
+}
+
+uint32_t
+AMDILMachineFunctionInfo::getLocal(uint32_t dim)
+{
+  if (mKernel && mKernel->sgv) {
+    AMDILKernelAttr *sgv = mKernel->sgv;
+    switch (dim) {
+    default: break;
+    case 0:
+    case 1:
+    case 2:
+      return sgv->reqGroupSize[dim];
+      break;
+    case 3:
+      return sgv->reqGroupSize[0] * sgv->reqGroupSize[1] * sgv->reqGroupSize[2];
+    };
+  }
+  switch (dim) {
+  default:
+    return 1;
+  case 3:
+    return mSTM->getDefaultSize(0) *
+           mSTM->getDefaultSize(1) *
+           mSTM->getDefaultSize(2);
+  case 2:
+  case 1:
+  case 0:
+    return mSTM->getDefaultSize(dim);
+    break;
+  };
+  return 1;
+}
+bool
+AMDILMachineFunctionInfo::isKernel() const
+{
+  return mKernel != NULL && mKernel->mKernel;
+}
+
+AMDILKernel*
+AMDILMachineFunctionInfo::getKernel()
+{
+  return mKernel;
+}
+
+std::string
+AMDILMachineFunctionInfo::getName()
+{
+  if (mMF) {
+    return mMF->getFunction()->getName();
+  } else {
+    return "";
+  }
+}
+
+uint32_t
+AMDILMachineFunctionInfo::getArgSize()
+{
+  if (mArgSize == -1) {
+    Function::const_arg_iterator I = mMF->getFunction()->arg_begin();
+    Function::const_arg_iterator Ie = mMF->getFunction()->arg_end();
+    uint32_t Counter = 0;
+    while (I != Ie) {
+      Type* curType = I->getType();
+      if (curType->isIntegerTy() || curType->isFloatingPointTy()) {
+        ++Counter;
+      } else if (const VectorType *VT = dyn_cast<VectorType>(curType)) {
+        Type *ET = VT->getElementType();
+        int numEle = VT->getNumElements();
+        switch (ET->getPrimitiveSizeInBits()) {
+          default:
+            if (numEle == 3) {
+              Counter++;
+            } else {
+              Counter += ((numEle + 2) >> 2);
+            }
+            break;
+          case 64:
+            if (numEle == 3) {
+              Counter += 2;
+            } else {
+              Counter += (numEle >> 1);
+            }
+            break;
+          case 16:
+          case 8:
+            switch (numEle) {
+              default:
+                Counter += ((numEle + 2) >> 2);
+              case 2:
+                Counter++;
+                break;
+            }
+            break;
+        }
+      } else if (const PointerType *PT = dyn_cast<PointerType>(curType)) {
+        Type *CT = PT->getElementType();
+        const StructType *ST = dyn_cast<StructType>(CT);
+        if (ST && ST->isOpaque()) {
+          bool i1d  = ST->getName() == "struct._image1d_t";
+          bool i1da = ST->getName() == "struct._image1d_array_t";
+          bool i1db = ST->getName() == "struct._image1d_buffer_t";
+          bool i2d  = ST->getName() == "struct._image2d_t";
+          bool i2da = ST->getName() == "struct._image2d_array_t";
+          bool i3d  = ST->getName() == "struct._image3d_t";
+          bool is_image = i1d || i1da || i1db || i2d || i2da || i3d;
+          if (is_image) {
+            if (mSTM->device()->isSupported(AMDILDeviceInfo::Images)) {
+              Counter += 2;
+            } else {
+              addErrorMsg(amd::CompilerErrorMessage[NO_IMAGE_SUPPORT]);
+            }
+          } else {
+            Counter++;
+          }
+        } else if (CT->isStructTy()
+            && PT->getAddressSpace() == AMDILAS::PRIVATE_ADDRESS) {
+          StructType *ST = dyn_cast<StructType>(CT);
+          Counter += ((getTypeSize(ST) + 15) & ~15) >> 4;
+        } else if (CT->isIntOrIntVectorTy()
+            || CT->isFPOrFPVectorTy()
+            || CT->isArrayTy()
+            || CT->isPointerTy()
+            || PT->getAddressSpace() != AMDILAS::PRIVATE_ADDRESS) {
+          ++Counter;
+        } else {
+          assert(0 && "Current type is not supported!");
+          addErrorMsg(amd::CompilerErrorMessage[INTERNAL_ERROR]);
+        }
+      } else {
+        assert(0 && "Current type is not supported!");
+        addErrorMsg(amd::CompilerErrorMessage[INTERNAL_ERROR]);
+      }
+      ++I;
+    }
+    // Convert from slots to bytes by multiplying by 16(shift by 4).
+    mArgSize = Counter << 4;
+  }
+  return (uint32_t)mArgSize;
+}
+  uint32_t
+AMDILMachineFunctionInfo::getScratchSize()
+{
+  if (mScratchSize == -1) {
+    mScratchSize = 0;
+    Function::const_arg_iterator I = mMF->getFunction()->arg_begin();
+    Function::const_arg_iterator Ie = mMF->getFunction()->arg_end();
+    while (I != Ie) {
+      Type *curType = I->getType();
+      mScratchSize += ((getTypeSize(curType) + 15) & ~15);
+      ++I;
+    }
+    mScratchSize += ((mScratchSize + 15) & ~15);
+  }
+  return (uint32_t)mScratchSize;
+}
+
+  uint32_t
+AMDILMachineFunctionInfo::getStackSize()
+{
+  if (mStackSize == -1) {
+    uint32_t privSize = 0;
+    const MachineFrameInfo *MFI = mMF->getFrameInfo();
+    privSize = MFI->getOffsetAdjustment() + MFI->getStackSize();
+    const AMDILTargetMachine *TM = 
+      reinterpret_cast<const AMDILTargetMachine*>(&mMF->getTarget());
+    bool addStackSize = TM->getOptLevel() == CodeGenOpt::None;
+    Function::const_arg_iterator I = mMF->getFunction()->arg_begin();
+    Function::const_arg_iterator Ie = mMF->getFunction()->arg_end();
+    while (I != Ie) {
+      Type *curType = I->getType();
+      ++I;
+      if (dyn_cast<PointerType>(curType)) {
+        Type *CT = dyn_cast<PointerType>(curType)->getElementType();
+        if (CT->isStructTy()
+            && dyn_cast<PointerType>(curType)->getAddressSpace() 
+            == AMDILAS::PRIVATE_ADDRESS) {
+          addStackSize = true;
+        }
+      }
+    }
+    if (addStackSize) {
+      privSize += getScratchSize();
+    }
+    mStackSize = privSize;
+  }
+  return (uint32_t)mStackSize;
+
+}
+
+uint32_t 
+AMDILMachineFunctionInfo::addi32Literal(uint32_t val, int Opcode) {
+  // Since we have emulated 16/8/1 bit register types with a 32bit real
+  // register, we need to sign extend the constants to 32bits in order for
+  // comparisons against the constants to work correctly, this fixes some issues
+  // we had in conformance failing for saturation.
+  if (Opcode == AMDIL::LOADCONST_i16) {
+    val = (((int32_t)val << 16) >> 16);
+  } else if (Opcode == AMDIL::LOADCONST_i8) {
+    val = (((int32_t)val << 24) >> 24);
+  }
+  if (mIntLits.find(val) == mIntLits.end()) {
+    mIntLits[val] = getNumLiterals();
+  }
+  return mIntLits[val];
+}
+
+uint32_t 
+AMDILMachineFunctionInfo::addi64Literal(uint64_t val) {
+  if (mLongLits.find(val) == mLongLits.end()) {
+    mLongLits[val] = getNumLiterals();
+  }
+  return mLongLits[val];
+}
+
+uint32_t 
+AMDILMachineFunctionInfo::addi128Literal(uint64_t val_lo, uint64_t val_hi) {
+  std::pair<uint64_t, uint64_t> a;
+  a.first = val_lo;
+  a.second = val_hi;
+  if (mVecLits.find(a) == mVecLits.end()) {
+    mVecLits[a] = getNumLiterals();
+  }
+  return mVecLits[a];
+}
+
+uint32_t 
+AMDILMachineFunctionInfo::addf32Literal(const ConstantFP *CFP) {
+  uint32_t val = (uint32_t)CFP->getValueAPF().bitcastToAPInt().getZExtValue();
+  if (mIntLits.find(val) == mIntLits.end()) {
+    mIntLits[val] = getNumLiterals();
+  }
+  return mIntLits[val];
+}
+
+uint32_t 
+AMDILMachineFunctionInfo::addf64Literal(const ConstantFP *CFP) {
+  union dtol_union {
+    double d;
+    uint64_t ul;
+  } dval;
+  const APFloat &APF = CFP->getValueAPF();
+  if (&APF.getSemantics() == (const llvm::fltSemantics *)&APFloat::IEEEsingle) {
+    float fval = APF.convertToFloat();
+    dval.d = (double)fval;
+  } else {
+    dval.d = APF.convertToDouble();
+  }
+  if (mLongLits.find(dval.ul) == mLongLits.end()) {
+    mLongLits[dval.ul] = getNumLiterals();
+  }
+  return mLongLits[dval.ul];
+}
+
+  uint32_t 
+AMDILMachineFunctionInfo::getIntLits(uint32_t offset) 
+{
+  return mIntLits[offset];
+}
+
+  uint32_t 
+AMDILMachineFunctionInfo::getLongLits(uint64_t offset) 
+{
+  return mLongLits[offset];
+}
+
+  uint32_t
+AMDILMachineFunctionInfo::getVecLits(uint64_t low64, uint64_t high64)
+{
+  return mVecLits[std::pair<uint64_t, uint64_t>(low64, high64)];
+}
+
+size_t 
+AMDILMachineFunctionInfo::getNumLiterals() const {
+  return mLongLits.size() + mIntLits.size() + mVecLits.size() + mReservedLits;
+}
+
+  void
+AMDILMachineFunctionInfo::addReservedLiterals(uint32_t size)
+{
+  mReservedLits += size;
+}
+
+  uint32_t 
+AMDILMachineFunctionInfo::addSampler(std::string name, uint32_t val)
+{
+  if (mSamplerMap.find(name) != mSamplerMap.end()) {
+    SamplerInfo newVal = mSamplerMap[name];
+    assert(newVal.val == val 
+        && "Found a sampler with same name but different values!");
+    return mSamplerMap[name].idx;
+  } else {
+    SamplerInfo curVal;
+    curVal.name = name;
+    curVal.val = val;
+    curVal.idx = mSamplerMap.size();
+    mSamplerMap[name] = curVal;
+    return curVal.idx;
+  }
+}
+
+void
+AMDILMachineFunctionInfo::setUsesMem(unsigned id) {
+  assert(id < AMDILDevice::MAX_IDS &&
+      "Must set the ID to be less than MAX_IDS!");
+  mUsedMem[id] = true;
+}
+
+bool 
+AMDILMachineFunctionInfo::usesMem(unsigned id) {
+  assert(id < AMDILDevice::MAX_IDS &&
+      "Must set the ID to be less than MAX_IDS!");
+  return mUsedMem[id];
+}
+
+  void 
+AMDILMachineFunctionInfo::addErrorMsg(const char *msg, ErrorMsgEnum val) 
+{
+  if (val == DEBUG_ONLY) {
+#if defined(DEBUG) || defined(_DEBUG)
+    mErrors.insert(msg);
+#endif
+  }  else if (val == RELEASE_ONLY) {
+#if !defined(DEBUG) && !defined(_DEBUG)
+    mErrors.insert(msg);
+#endif
+  } else if (val == ALWAYS) {
+    mErrors.insert(msg);
+  }
+}
+
+  uint32_t 
+AMDILMachineFunctionInfo::addPrintfString(std::string &name, unsigned offset) 
+{
+  if (mPrintfMap.find(name) != mPrintfMap.end()) {
+    return mPrintfMap[name]->getPrintfID();
+  } else {
+    PrintfInfo *info = new PrintfInfo;
+    info->setPrintfID(mPrintfMap.size() + offset);
+    mPrintfMap[name] = info;
+    return info->getPrintfID();
+  }
+}
+
+  void 
+AMDILMachineFunctionInfo::addPrintfOperand(std::string &name, 
+    size_t idx,
+    uint32_t size) 
+{
+  mPrintfMap[name]->addOperand(idx, size);
+}
+
+  void 
+AMDILMachineFunctionInfo::addMetadata(const char *md, bool kernelOnly) 
+{
+  addMetadata(std::string(md), kernelOnly);
+}
+
+  void 
+AMDILMachineFunctionInfo::addMetadata(std::string md, bool kernelOnly) 
+{
+  if (kernelOnly) {
+    mMetadataKernel.push_back(md);
+  } else {
+    mMetadataFunc.insert(md);
+  }
+}
+
--- a/src/gallium/drivers/radeon/AMDILMachineFunctionInfo.h
+++ b/src/gallium/drivers/radeon/AMDILMachineFunctionInfo.h
@ -0,0 +1,422 @@
+//== AMDILMachineFunctionInfo.h - AMD il Machine Function Info -*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This file declares AMDIL-specific per-machine-function information
+//
+//===----------------------------------------------------------------------===//
+#ifndef _AMDILMACHINEFUNCTIONINFO_H_
+#define _AMDILMACHINEFUNCTIONINFO_H_
+#include "AMDIL.h"
+#include "AMDILDevice.h"
+#include "AMDILKernel.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/ADT/ValueMap.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/Function.h"
+
+#include <map>
+#include <set>
+#include <string>
+
+namespace llvm
+{
+  class AMDILSubtarget;
+  class PrintfInfo {
+    uint32_t mPrintfID;
+    SmallVector<uint32_t, DEFAULT_VEC_SLOTS> mOperands;
+    public:
+    void addOperand(size_t idx, uint32_t size);
+    uint32_t getPrintfID();
+    void setPrintfID(uint32_t idx);
+    size_t getNumOperands();
+    uint32_t getOperandID(uint32_t idx);
+  }; // class PrintfInfo
+
+  enum NameDecorationStyle
+  {
+    None,
+    StdCall,
+    FastCall
+  };
+  typedef struct SamplerInfoRec {
+    std::string name; // The name of the sampler
+    uint32_t val; // The value of the sampler
+    uint32_t idx; // The sampler resource id
+  } SamplerInfo;
+  // Some typedefs that will help with using the various iterators
+  // of the machine function info class.
+  typedef std::map<uint32_t, uint32_t>::iterator lit32_iterator;
+  typedef std::map<uint64_t, uint32_t>::iterator lit64_iterator;
+  typedef std::map<std::pair<uint64_t, uint64_t>, uint32_t>::iterator
+    lit128_iterator;
+  typedef StringMap<SamplerInfo>::iterator sampler_iterator;
+  typedef DenseSet<uint32_t>::iterator func_iterator;
+  typedef DenseSet<uint32_t>::iterator intr_iterator;
+  typedef DenseSet<uint32_t>::iterator uav_iterator;
+  typedef DenseSet<uint32_t>::iterator read_image2d_iterator;
+  typedef DenseSet<uint32_t>::iterator read_image3d_iterator;
+  typedef DenseSet<uint32_t>::iterator write_image2d_iterator;
+  typedef DenseSet<uint32_t>::iterator write_image3d_iterator;
+  typedef DenseSet<const char*>::iterator error_iterator;
+  typedef std::map<std::string, PrintfInfo*>::iterator printf_iterator;
+  typedef std::set<std::string>::iterator func_md_iterator;
+  typedef std::vector<std::string>::iterator kernel_md_iterator;
+  // AMDILMachineFunctionInfo - This class is
+  // derived from MachineFunction private
+  // amdil target-specific information for each MachineFunction
+  class AMDILMachineFunctionInfo : public MachineFunctionInfo
+  {
+    // CalleeSavedFrameSize - Size of the callee-saved
+    // register portion of the
+    // stack frame in bytes.
+    unsigned int CalleeSavedFrameSize;
+    // BytesToPopOnReturn - Number of bytes function pops on return.
+    // Used on windows platform for stdcall & fastcall name decoration
+    unsigned int BytesToPopOnReturn;
+    // DecorationStyle - If the function requires additional
+    // name decoration,
+    // DecorationStyle holds the right way to do so.
+    NameDecorationStyle DecorationStyle;
+    // ReturnAddrIndex - FrameIndex for return slot.
+    int ReturnAddrIndex;
+
+    // TailCallReturnAddrDelta - Delta the ReturnAddr stack slot is moved
+    // Used for creating an area before the register spill area
+    // on the stack
+    // the returnaddr can be savely move to this area
+    int TailCallReturnAddrDelta;
+
+    // SRetReturnReg - Some subtargets require that sret lowering includes
+    // returning the value of the returned struct in a register.
+    // This field holds the virtual register into which the sret
+    // argument is passed.
+    unsigned int SRetReturnReg;
+
+    // UsesLocal - Specifies that this function uses LDS memory and
+    // that it needs to be allocated.
+    bool UsesLDS;
+
+    // LDSArg - Flag that specifies if this function has an Local
+    // argument or not
+    bool LDSArg;
+
+    // UsesGDS - Specifies that this function uses GDS memory and
+    // that it needs to be allocated.
+    bool UsesGDS;
+
+    // GDSArg - Flag that specifies if this function has an Region
+    // argument or not
+    bool GDSArg;
+
+    // The size in bytes required to host all of the kernel arguments.
+    // -1 means this value has not been determined yet.
+    int32_t mArgSize;
+
+    // The size in bytes required to host the stack and the kernel arguments
+    // in private memory.
+    // -1 means this value has not been determined yet.
+    int32_t mScratchSize;
+
+    // The size in bytes required to host the the kernel arguments
+    // on the stack.
+    // -1 means this value has not been determined yet.
+    int32_t mStackSize;
+
+    /// A map of constant to literal mapping for all of the 32bit or
+    /// smaller literals in the current function.
+    std::map<uint32_t, uint32_t> mIntLits;
+
+    /// A map of constant to literal mapping for all of the 64bit
+    /// literals in the current function.
+    std::map<uint64_t, uint32_t> mLongLits;
+
+    /// A map of constant to literal mapping for all of the 128bit
+    /// literals in the current function.
+    std::map<std::pair<uint64_t, uint64_t>, uint32_t> mVecLits;
+
+    /// The number of literals that should be reserved.
+    /// TODO: Remove this when the wrapper emitter is added.
+    uint32_t mReservedLits;
+
+    /// A map of name to sampler information that is used to emit
+    /// metadata to the IL stream that the runtimes can use for
+    /// hardware setup.
+    StringMap<SamplerInfo> mSamplerMap;
+
+    /// Array of flags to specify if a specific memory type is used or not.
+    bool mUsedMem[AMDILDevice::MAX_IDS];
+
+    /// Set of all functions that this function calls.
+    DenseSet<uint32_t> mFuncs;
+
+    /// Set of all intrinsics that this function calls.
+    DenseSet<uint32_t> mIntrs;
+
+    /// Set of all read only 2D images.
+    DenseSet<uint32_t> mRO2D;
+    /// Set of all read only 3D images.
+    DenseSet<uint32_t> mRO3D;
+    /// Set of all write only 2D images.
+    DenseSet<uint32_t> mWO2D;
+    /// Set of all write only 3D images.
+    DenseSet<uint32_t> mWO3D;
+    /// Set of all the raw uavs.
+    DenseSet<uint32_t> mRawUAV;
+    /// Set of all the arena uavs.
+    DenseSet<uint32_t> mArenaUAV;
+
+    /// A set of all errors that occured in the backend for this function.
+    DenseSet<const char *> mErrors;
+
+    /// A mapping of printf data and the printf string
+    std::map<std::string, PrintfInfo*> mPrintfMap;
+
+    /// A set of all of the metadata that is used for the current function.
+    std::set<std::string> mMetadataFunc;
+
+    /// A set of all of the metadata that is used for the function wrapper.
+    std::vector<std::string> mMetadataKernel;
+
+    /// Information about the kernel, NULL if the function is not a kernel.
+    AMDILKernel *mKernel;
+
+    /// Pointer to the machine function that this information belongs to.
+    MachineFunction *mMF;
+
+    /// Pointer to the subtarget for this function.
+    const AMDILSubtarget *mSTM;
+    public:
+    AMDILMachineFunctionInfo();
+    AMDILMachineFunctionInfo(MachineFunction &MF);
+    virtual ~AMDILMachineFunctionInfo();
+    unsigned int
+      getCalleeSavedFrameSize() const;
+    void
+      setCalleeSavedFrameSize(unsigned int bytes);
+
+    unsigned int
+      getBytesToPopOnReturn() const;
+    void
+      setBytesToPopOnReturn (unsigned int bytes);
+
+    NameDecorationStyle
+      getDecorationStyle() const;
+    void
+      setDecorationStyle(NameDecorationStyle style);
+
+    int
+      getRAIndex() const;
+    void
+      setRAIndex(int Index);
+
+    int
+      getTCReturnAddrDelta() const;
+    void
+      setTCReturnAddrDelta(int delta);
+
+    unsigned int
+      getSRetReturnReg() const;
+    void
+      setSRetReturnReg(unsigned int Reg);
+
+    void 
+      setUsesLocal();
+    bool 
+      usesLocal() const;
+    void
+      setHasLocalArg();
+    bool 
+      hasLocalArg() const;
+
+    void 
+      setUsesRegion();
+    bool 
+      usesRegion() const;
+    void
+      setHasRegionArg();
+    bool 
+      hasRegionArg() const;
+
+    bool
+      usesHWConstant(std::string name) const;
+    uint32_t
+      getLocal(uint32_t);
+    bool
+      isKernel() const;
+    AMDILKernel*
+      getKernel();
+
+    std::string
+      getName();
+
+    /// Get the size in bytes that are required to host all of
+    /// arguments based on the argument alignment rules in the AMDIL 
+    /// Metadata spec.
+    uint32_t getArgSize();
+
+    /// Get the size in bytes that are required to host all of
+    /// arguments and stack memory in scratch.
+    uint32_t getScratchSize();
+
+    /// Get the size in bytes that is required to host all of
+    /// the arguments on the stack.
+    uint32_t getStackSize();
+
+    ///
+    /// @param val value to add the lookup table
+    /// @param Opcode opcode of the literal instruction
+    /// @brief adds the specified value of the type represented by the
+    /// Opcode
+    /// to the literal to integer and integer to literal mappings.
+    ///
+    /// Add a 32bit integer value to the literal table.
+    uint32_t addi32Literal(uint32_t val, int Opcode = AMDIL::LOADCONST_i32);
+
+    /// Add a 32bit floating point value to the literal table.
+    uint32_t addf32Literal(const ConstantFP *CFP);
+
+    /// Add a 64bit integer value to the literal table.
+    uint32_t addi64Literal(uint64_t val);
+
+    /// Add a 128 bit integer value to the literal table.
+    uint32_t addi128Literal(uint64_t val_lo, uint64_t val_hi);
+
+    /// Add a 64bit floating point literal as a 64bit integer value.
+    uint32_t addf64Literal(const ConstantFP *CFP);
+
+    /// Get the number of literals that have currently been allocated.
+    size_t getNumLiterals() const;
+
+    /// Get the literal ID of an Integer literal of the given offset.
+    uint32_t getIntLits(uint32_t lit);
+
+    /// Get the literal ID of a Long literal of the given offset.
+    uint32_t getLongLits(uint64_t lit);
+
+    /// Get the literal ID of a Long literal of the given offset.
+    uint32_t getVecLits(uint64_t low64, uint64_t high64);
+
+    /// Add some literals to the number of reserved literals.
+    void addReservedLiterals(uint32_t);
+
+    // Functions that return iterators to the beginning and end
+    // of the various literal maps.
+    // Functions that return the beginning and end of the 32bit literal map
+    lit32_iterator begin_32() { return mIntLits.begin(); }
+    lit32_iterator end_32() { return mIntLits.end(); }
+
+    // Functions that return the beginning and end of the 64bit literal map
+    lit64_iterator begin_64() { return mLongLits.begin(); }
+    lit64_iterator end_64() { return mLongLits.end(); }
+
+    // Functions that return the beginning and end of the 2x64bit literal map
+    lit128_iterator begin_128() { return mVecLits.begin(); }
+    lit128_iterator end_128() { return mVecLits.end(); }
+
+    // Add a sampler to the set of known samplers for the current kernel.
+    uint32_t addSampler(std::string name, uint32_t value);
+    
+    // Iterators that point to the beginning and end of the sampler map.
+    sampler_iterator sampler_begin() { return mSamplerMap.begin(); }
+    sampler_iterator sampler_end() { return mSamplerMap.end(); }
+
+
+    /// Set the flag for the memory ID to true for the current function.
+    void setUsesMem(unsigned);
+    /// Retrieve the flag for the memory ID.
+    bool usesMem(unsigned);
+
+    /// Add called functions to the set of all functions this function calls.
+    void addCalledFunc(uint32_t id) { mFuncs.insert(id); }
+    void eraseCalledFunc(uint32_t id) { mFuncs.erase(id); }
+    size_t func_size() { return mFuncs.size(); }
+    bool func_empty() { return mFuncs.empty(); }
+    func_iterator func_begin() { return mFuncs.begin(); }
+    func_iterator func_end() { return mFuncs.end(); }
+
+    /// Add called intrinsics to the set of all intrinscis this function calls.
+    void addCalledIntr(uint32_t id) { mIntrs.insert(id); }
+    size_t intr_size() { return mIntrs.size(); }
+    bool intr_empty() { return mIntrs.empty(); }
+    intr_iterator intr_begin() { return mIntrs.begin(); }
+    intr_iterator intr_end() { return mIntrs.end(); }
+
+    /// Add a 2D read_only image id.
+    void addROImage2D(uint32_t id) { mRO2D.insert(id); }
+    size_t read_image2d_size() { return mRO2D.size(); }
+    read_image2d_iterator read_image2d_begin() { return mRO2D.begin(); }
+    read_image2d_iterator read_image2d_end() { return mRO2D.end(); }
+
+    /// Add a 3D read_only image id.
+    void addROImage3D(uint32_t id) { mRO3D.insert(id); }
+    size_t read_image3d_size() { return mRO3D.size(); }
+    read_image3d_iterator read_image3d_begin() { return mRO3D.begin(); }
+    read_image3d_iterator read_image3d_end() { return mRO3D.end(); }
+
+    /// Add a 2D write_only image id.
+    void addWOImage2D(uint32_t id) { mWO2D.insert(id); }
+    size_t write_image2d_size() { return mWO2D.size(); }
+    write_image2d_iterator write_image2d_begin() { return mWO2D.begin(); }
+    write_image2d_iterator write_image2d_end() { return mWO2D.end(); }
+
+       /// Add a 3D write_only image id.
+    void addWOImage3D(uint32_t id) { mWO3D.insert(id); }
+    size_t write_image3d_size() { return mWO3D.size(); }
+    write_image3d_iterator write_image3d_begin() { return mWO3D.begin(); }
+    write_image3d_iterator write_image3d_end() { return mWO3D.end(); }
+
+    /// Add a raw uav id.
+    void uav_insert(uint32_t id) { mRawUAV.insert(id); }
+    bool uav_count(uint32_t id) { return mRawUAV.count(id); }
+    size_t uav_size() { return mRawUAV.size(); }
+    uav_iterator uav_begin() { return mRawUAV.begin(); }
+    uav_iterator uav_end() { return mRawUAV.end(); }
+
+    /// Add an arena uav id.
+    void arena_insert(uint32_t id) { mArenaUAV.insert(id); }
+    bool arena_count(uint32_t id) { return mArenaUAV.count(id); }
+    size_t arena_size() { return mArenaUAV.size(); }
+    uav_iterator arena_begin() { return mArenaUAV.begin(); }
+    uav_iterator arena_end() { return mArenaUAV.end(); }
+
+    // Add an error to the output for the current function.
+    typedef enum {
+      RELEASE_ONLY, /// Only emit error message in release mode.
+      DEBUG_ONLY, /// Only emit error message in debug mode.
+      ALWAYS /// Always emit the error message.
+    } ErrorMsgEnum;
+    /// Add an error message to the set of all error messages.
+    void addErrorMsg(const char* msg, ErrorMsgEnum val = ALWAYS);
+    bool errors_empty() { return mErrors.empty(); }
+    error_iterator errors_begin() { return mErrors.begin(); }
+    error_iterator errors_end() { return mErrors.end(); }
+
+    /// Add a string to the printf map
+    uint32_t addPrintfString(std::string &name, unsigned offset);
+    /// Add a operand to the printf string
+    void addPrintfOperand(std::string &name, size_t idx, uint32_t size);
+    bool printf_empty() { return mPrintfMap.empty(); }
+    size_t printf_size() { return mPrintfMap.size(); }
+    printf_iterator printf_begin() { return mPrintfMap.begin(); }
+    printf_iterator printf_end() { return mPrintfMap.end(); }
+
+    /// Add a string to the metadata set for a function/kernel wrapper
+    void addMetadata(const char *md, bool kernelOnly = false);
+    void addMetadata(std::string md, bool kernelOnly = false);
+    func_md_iterator func_md_begin() { return mMetadataFunc.begin(); }
+    func_md_iterator func_md_end() { return mMetadataFunc.end(); }
+    kernel_md_iterator kernel_md_begin() { return mMetadataKernel.begin(); }
+    kernel_md_iterator kernel_md_end() { return mMetadataKernel.end(); }
+  };
+} // llvm namespace
+#endif // _AMDILMACHINEFUNCTIONINFO_H_
--- a/src/gallium/drivers/radeon/AMDILMachinePeephole.cpp
+++ b/src/gallium/drivers/radeon/AMDILMachinePeephole.cpp
@ -0,0 +1,173 @@
+//===-- AMDILMachinePeephole.cpp - AMDIL Machine Peephole Pass -*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+
+#define DEBUG_TYPE "machine_peephole"
+#if !defined(NDEBUG)
+#define DEBUGME (DebugFlag && isCurrentDebugType(DEBUG_TYPE))
+#else
+#define DEBUGME (false)
+#endif
+
+#include "AMDIL.h"
+#include "AMDILSubtarget.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Target/TargetMachine.h"
+
+using namespace llvm;
+namespace
+{
+  class AMDILMachinePeephole : public MachineFunctionPass
+  {
+    public:
+      static char ID;
+      AMDILMachinePeephole(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+      //virtual ~AMDILMachinePeephole();
+      virtual const char*
+        getPassName() const;
+      virtual bool
+        runOnMachineFunction(MachineFunction &MF);
+    private:
+      void insertFence(MachineBasicBlock::iterator &MIB);
+      TargetMachine &TM;
+      bool mDebug;
+  }; // AMDILMachinePeephole
+  char AMDILMachinePeephole::ID = 0;
+} // anonymous namespace
+
+namespace llvm
+{
+  FunctionPass*
+    createAMDILMachinePeephole(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+    {
+      return new AMDILMachinePeephole(tm AMDIL_OPT_LEVEL_VAR);
+    }
+} // llvm namespace
+
+AMDILMachinePeephole::AMDILMachinePeephole(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+  : MachineFunctionPass(ID), TM(tm)
+{
+  mDebug = DEBUGME;
+}
+
+bool
+AMDILMachinePeephole::runOnMachineFunction(MachineFunction &MF)
+{
+  bool Changed = false;
+  const AMDILSubtarget *STM = &TM.getSubtarget<AMDILSubtarget>();
+  for (MachineFunction::iterator MBB = MF.begin(), MBE = MF.end();
+      MBB != MBE; ++MBB) {
+    MachineBasicBlock *mb = MBB;
+    for (MachineBasicBlock::iterator MIB = mb->begin(), MIE = mb->end();
+        MIB != MIE; ++MIB) {
+      MachineInstr *mi = MIB;
+      const char * name;
+      name = TM.getInstrInfo()->getName(mi->getOpcode());
+      switch (mi->getOpcode()) {
+        default:
+          if (isAtomicInst(TM.getInstrInfo(), mi)) {
+            // If we don't support the hardware accellerated address spaces,
+            // then the atomic needs to be transformed to the global atomic.
+            if (strstr(name, "_L_")
+                && STM->device()->usesSoftware(AMDILDeviceInfo::LocalMem)) {
+              BuildMI(*mb, MIB, mi->getDebugLoc(), 
+                  TM.getInstrInfo()->get(AMDIL::ADD_i32), AMDIL::R1011)
+                .addReg(mi->getOperand(1).getReg())
+                .addReg(AMDIL::T2);
+              mi->getOperand(1).setReg(AMDIL::R1011);
+              mi->setDesc(
+                  TM.getInstrInfo()->get(
+                    (mi->getOpcode() - AMDIL::ATOM_L_ADD) + AMDIL::ATOM_G_ADD));
+            } else if (strstr(name, "_R_")
+                && STM->device()->usesSoftware(AMDILDeviceInfo::RegionMem)) {
+              assert(!"Software region memory is not supported!");
+              mi->setDesc(
+                  TM.getInstrInfo()->get(
+                    (mi->getOpcode() - AMDIL::ATOM_R_ADD) + AMDIL::ATOM_G_ADD));
+            }
+          } else if ((isLoadInst(TM.getInstrInfo(), mi) || isStoreInst(TM.getInstrInfo(), mi)) && isVolatileInst(TM.getInstrInfo(), mi)) {
+            insertFence(MIB);
+          }
+          continue;
+          break;
+        case AMDIL::USHR_i16:
+        case AMDIL::USHR_v2i16:
+        case AMDIL::USHR_v4i16:
+        case AMDIL::USHRVEC_i16:
+        case AMDIL::USHRVEC_v2i16:
+        case AMDIL::USHRVEC_v4i16:
+          if (TM.getSubtarget<AMDILSubtarget>()
+              .device()->usesSoftware(AMDILDeviceInfo::ShortOps)) {
+            unsigned lReg = MF.getRegInfo()
+              .createVirtualRegister(&AMDIL::GPRI32RegClass);
+            unsigned Reg = MF.getRegInfo()
+              .createVirtualRegister(&AMDIL::GPRV4I32RegClass);
+            BuildMI(*mb, MIB, mi->getDebugLoc(),
+                TM.getInstrInfo()->get(AMDIL::LOADCONST_i32),
+                lReg).addImm(0xFFFF);
+            BuildMI(*mb, MIB, mi->getDebugLoc(),
+                TM.getInstrInfo()->get(AMDIL::BINARY_AND_v4i32),
+                Reg)
+              .addReg(mi->getOperand(1).getReg())
+              .addReg(lReg);
+            mi->getOperand(1).setReg(Reg);
+          }
+          break;
+        case AMDIL::USHR_i8:
+        case AMDIL::USHR_v2i8:
+        case AMDIL::USHR_v4i8:
+        case AMDIL::USHRVEC_i8:
+        case AMDIL::USHRVEC_v2i8:
+        case AMDIL::USHRVEC_v4i8:
+          if (TM.getSubtarget<AMDILSubtarget>()
+              .device()->usesSoftware(AMDILDeviceInfo::ByteOps)) {
+            unsigned lReg = MF.getRegInfo()
+              .createVirtualRegister(&AMDIL::GPRI32RegClass);
+            unsigned Reg = MF.getRegInfo()
+              .createVirtualRegister(&AMDIL::GPRV4I32RegClass);
+            BuildMI(*mb, MIB, mi->getDebugLoc(),
+                TM.getInstrInfo()->get(AMDIL::LOADCONST_i32),
+                lReg).addImm(0xFF);
+            BuildMI(*mb, MIB, mi->getDebugLoc(),
+                TM.getInstrInfo()->get(AMDIL::BINARY_AND_v4i32),
+                Reg)
+              .addReg(mi->getOperand(1).getReg())
+              .addReg(lReg);
+            mi->getOperand(1).setReg(Reg);
+          }
+          break;
+      }
+    }
+  }
+  return Changed;
+}
+
+const char*
+AMDILMachinePeephole::getPassName() const
+{
+  return "AMDIL Generic Machine Peephole Optimization Pass";
+}
+
+void
+AMDILMachinePeephole::insertFence(MachineBasicBlock::iterator &MIB)
+{
+  MachineInstr *MI = MIB;
+  MachineInstr *fence = BuildMI(*(MI->getParent()->getParent()),
+        MI->getDebugLoc(),
+        TM.getInstrInfo()->get(AMDIL::FENCE)).addReg(1);
+
+  MI->getParent()->insert(MIB, fence);
+  fence = BuildMI(*(MI->getParent()->getParent()),
+        MI->getDebugLoc(),
+        TM.getInstrInfo()->get(AMDIL::FENCE)).addReg(1);
+  MIB = MI->getParent()->insertAfter(MIB, fence);
+}
--- a/src/gallium/drivers/radeon/AMDILModuleInfo.cpp
+++ b/src/gallium/drivers/radeon/AMDILModuleInfo.cpp
--- a/src/gallium/drivers/radeon/AMDILModuleInfo.h
+++ b/src/gallium/drivers/radeon/AMDILModuleInfo.h
@ -0,0 +1,159 @@
+//===--------------- AMDILModuleInfo.h -------------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// This is an MMI implementation for AMDIL targets.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef _AMDIL_MACHINE_MODULE_INFO_H_
+#define _AMDIL_MACHINE_MODULE_INFO_H_
+#include "AMDIL.h"
+#include "AMDILKernel.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/DenseSet.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringMap.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/Module.h"
+#include "llvm/Support/raw_ostream.h"
+
+#include <set>
+#include <string>
+
+namespace llvm {
+  class AMDILKernel;
+  class Argument;
+  class TypeSymbolTable;
+  class GlobalValue;
+  class MachineFunction;
+  class GlobalValue;
+
+  class AMDILModuleInfo : public MachineModuleInfoImpl {
+    protected:
+      const MachineModuleInfo *mMMI;
+    public:
+      AMDILModuleInfo(const MachineModuleInfo &);
+      virtual ~AMDILModuleInfo();
+
+      void processModule(const Module *MF, const AMDILTargetMachine* mTM);
+
+      /// Process the given module and parse out the global variable metadata passed
+      /// down from the frontend-compiler
+
+      /// Returns true if the image ID corresponds to a read only image.
+      bool isReadOnlyImage(const llvm::StringRef &name, uint32_t iID) const;
+
+      /// Returns true if the image ID corresponds to a write only image.
+      bool isWriteOnlyImage(const llvm::StringRef &name, uint32_t iID) const;
+
+      /// Gets the group size of the kernel for the given dimension.
+      uint32_t getRegion(const llvm::StringRef &name, uint32_t dim) const;
+
+      /// Get the offset of the array for the kernel.
+      int32_t getArrayOffset(const llvm::StringRef &name) const;
+
+      /// Get the offset of the const memory for the kernel.
+      int32_t getConstOffset(const llvm::StringRef &name) const;
+
+      /// Get the boolean value if this particular constant uses HW or not.
+      bool getConstHWBit(const llvm::StringRef &name) const;
+
+      /// Get a reference to the kernel metadata information for the given function
+      /// name.
+      AMDILKernel *getKernel(const llvm::StringRef &name);
+      bool isKernel(const llvm::StringRef &name) const;
+
+      /// Dump the data section to the output stream for the given kernel.
+      //void dumpDataSection(llvm::raw_ostream &O, AMDILKernelManager *km);
+
+      /// Iterate through the constants that are global to the compilation unit.
+      StringMap<AMDILConstPtr>::iterator consts_begin();
+      StringMap<AMDILConstPtr>::iterator consts_end();
+
+      /// Query if the kernel has a byte store.
+      bool byteStoreExists(llvm::StringRef S) const;
+
+      /// Query if the constant pointer is an argument.
+      bool isConstPtrArgument(const AMDILKernel *krnl, const llvm::StringRef &arg);
+
+      /// Query if the constant pointer is an array that is globally scoped.
+      bool isConstPtrArray(const AMDILKernel *krnl, const llvm::StringRef &arg);
+
+      /// Query the size of the constant pointer.
+      uint32_t getConstPtrSize(const AMDILKernel *krnl, const llvm::StringRef &arg);
+
+      /// Query the offset of the constant pointer.
+      uint32_t getConstPtrOff(const AMDILKernel *krnl, const llvm::StringRef &arg);
+
+      /// Query the constant buffer number for a constant pointer.
+      uint32_t getConstPtrCB(const AMDILKernel *krnl, const llvm::StringRef &arg);
+
+      /// Query the Value* that the constant pointer originates from.
+      const Value *getConstPtrValue(const AMDILKernel *krnl, const llvm::StringRef &arg);
+
+      /// Get the ID of the argument.
+      int32_t getArgID(const Argument *arg);
+
+      /// Get the unique function ID for the specific function name and create a new
+      /// unique ID if it is not found.
+      uint32_t getOrCreateFunctionID(const GlobalValue* func);
+      uint32_t getOrCreateFunctionID(const std::string& func);
+
+      /// Calculate the offsets of the constant pool for the given kernel and
+      /// machine function.
+      void calculateCPOffsets(const MachineFunction *MF, AMDILKernel *krnl);
+
+      void add_printf_offset(uint32_t offset) { mPrintfOffset += offset; }
+      uint32_t get_printf_offset() { return mPrintfOffset; }
+
+    private:
+      /// Various functions that parse global value information and store them in
+      /// the global manager. This approach is used instead of dynamic parsing as it
+      /// might require more space, but should allow caching of data that gets
+      /// requested multiple times.
+      AMDILKernelAttr parseSGV(const GlobalValue *GV);
+      AMDILLocalArg  parseLVGV(const GlobalValue *GV);
+      void parseGlobalAnnotate(const GlobalValue *G);
+      void parseImageAnnotate(const GlobalValue *G);
+      void parseConstantPtrAnnotate(const GlobalValue *G);
+      void printConstantValue(const Constant *CAval,
+          llvm::raw_ostream& O,
+          bool asByte);
+      void parseKernelInformation(const Value *V);
+      void parseAutoArray(const GlobalValue *G, bool isRegion);
+      void parseConstantPtr(const GlobalValue *G);
+      void allocateGlobalCB();
+      bool checkConstPtrsUseHW(Module::const_iterator *F);
+
+      llvm::StringMap<AMDILKernel*> mKernels;
+      llvm::StringMap<AMDILKernelAttr> mKernelArgs;
+      llvm::StringMap<AMDILArrayMem> mArrayMems;
+      llvm::StringMap<AMDILConstPtr> mConstMems;
+      llvm::StringMap<AMDILLocalArg> mLocalArgs;
+      llvm::StringMap<uint32_t> mFuncNames;
+      llvm::DenseMap<const GlobalValue*, uint32_t> mFuncPtrNames;
+      llvm::DenseMap<uint32_t, llvm::StringRef> mImageNameMap;
+      std::set<llvm::StringRef> mByteStore;
+      std::set<llvm::StringRef> mIgnoreStr;
+      llvm::DenseMap<const Argument *, int32_t> mArgIDMap;
+      const TypeSymbolTable *symTab;
+      const AMDILSubtarget *mSTM;
+      size_t mOffset;
+      uint32_t mReservedBuffs;
+      uint32_t mCurrentCPOffset;
+      uint32_t mPrintfOffset;
+  };
+
+
+
+} // end namespace llvm
+
+#endif // _AMDIL_COFF_MACHINE_MODULE_INFO_H_
+
--- a/src/gallium/drivers/radeon/AMDILMultiClass.td
+++ b/src/gallium/drivers/radeon/AMDILMultiClass.td
--- a/src/gallium/drivers/radeon/AMDILNIDevice.cpp
+++ b/src/gallium/drivers/radeon/AMDILNIDevice.cpp
@ -0,0 +1,71 @@
+//===-- AMDILNIDevice.cpp - TODO: Add brief description -------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+#include "AMDILNIDevice.h"
+#include "AMDILEvergreenDevice.h"
+#include "AMDILSubtarget.h"
+
+using namespace llvm;
+
+AMDILNIDevice::AMDILNIDevice(AMDILSubtarget *ST)
+  : AMDILEvergreenDevice(ST)
+{
+  std::string name = ST->getDeviceName();
+  if (name == "caicos") {
+    mDeviceFlag = OCL_DEVICE_CAICOS;
+  } else if (name == "turks") {
+    mDeviceFlag = OCL_DEVICE_TURKS;
+  } else if (name == "cayman") {
+    mDeviceFlag = OCL_DEVICE_CAYMAN;
+  } else {
+    mDeviceFlag = OCL_DEVICE_BARTS;
+  }
+}
+AMDILNIDevice::~AMDILNIDevice()
+{
+}
+
+size_t
+AMDILNIDevice::getMaxLDSSize() const
+{
+  if (usesHardware(AMDILDeviceInfo::LocalMem)) {
+    return MAX_LDS_SIZE_900;
+  } else {
+    return 0;
+  }
+}
+
+uint32_t
+AMDILNIDevice::getGeneration() const
+{
+  return AMDILDeviceInfo::HD6XXX;
+}
+
+
+AMDILCaymanDevice::AMDILCaymanDevice(AMDILSubtarget *ST)
+  : AMDILNIDevice(ST)
+{
+  setCaps();
+}
+
+AMDILCaymanDevice::~AMDILCaymanDevice()
+{
+}
+
+void
+AMDILCaymanDevice::setCaps()
+{
+  if (mSTM->isOverride(AMDILDeviceInfo::DoubleOps)) {
+    mHWBits.set(AMDILDeviceInfo::DoubleOps);
+    mHWBits.set(AMDILDeviceInfo::FMA);
+  }
+  mHWBits.set(AMDILDeviceInfo::Signed24BitOps);
+  mSWBits.reset(AMDILDeviceInfo::Signed24BitOps);
+  mSWBits.set(AMDILDeviceInfo::ArenaSegment);
+}
+
--- a/src/gallium/drivers/radeon/AMDILNIDevice.h
+++ b/src/gallium/drivers/radeon/AMDILNIDevice.h
@ -0,0 +1,59 @@
+//===------- AMDILNIDevice.h - Define NI Device for AMDIL -*- C++ -*------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//
+// Interface for the subtarget data classes.
+//
+//===---------------------------------------------------------------------===//
+// This file will define the interface that each generation needs to
+// implement in order to correctly answer queries on the capabilities of the
+// specific hardware.
+//===---------------------------------------------------------------------===//
+#ifndef _AMDILNIDEVICE_H_
+#define _AMDILNIDEVICE_H_
+#include "AMDILEvergreenDevice.h"
+#include "AMDILSubtarget.h"
+
+namespace llvm {
+  class AMDILSubtarget;
+//===---------------------------------------------------------------------===//
+// NI generation of devices and their respective sub classes
+//===---------------------------------------------------------------------===//
+
+// The AMDILNIDevice is the base class for all Northern Island series of
+// cards. It is very similiar to the AMDILEvergreenDevice, with the major
+// exception being differences in wavefront size and hardware capabilities.  The
+// NI devices are all 64 wide wavefronts and also add support for signed 24 bit
+// integer operations
+
+  class AMDILNIDevice : public AMDILEvergreenDevice {
+    public:
+      AMDILNIDevice(AMDILSubtarget*);
+      virtual ~AMDILNIDevice();
+      virtual size_t getMaxLDSSize() const;
+      virtual uint32_t getGeneration() const;
+    protected:
+  }; // AMDILNIDevice
+
+// Just as the AMDILCypressDevice is the double capable version of the
+// AMDILEvergreenDevice, the AMDILCaymanDevice is the double capable version of
+// the AMDILNIDevice.  The other major difference that is not as useful from
+// standpoint is that the Cayman Device has 4 wide ALU's, whereas the rest of the
+// NI family is a 5 wide.
+     
+  class AMDILCaymanDevice: public AMDILNIDevice {
+    public:
+      AMDILCaymanDevice(AMDILSubtarget*);
+      virtual ~AMDILCaymanDevice();
+    private:
+      virtual void setCaps();
+  }; // AMDILCaymanDevice
+
+  static const unsigned int MAX_LDS_SIZE_900 = AMDILDevice::MAX_LDS_SIZE_800;
+} // namespace llvm
+#endif // _AMDILNIDEVICE_H_
--- a/src/gallium/drivers/radeon/AMDILNodes.td
+++ b/src/gallium/drivers/radeon/AMDILNodes.td
@ -0,0 +1,325 @@
+//===- AMDILNodes.td - AMD IL nodes ------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Conversion DAG Nodes
+//===----------------------------------------------------------------------===//
+// Double to Single conversion
+def IL_d2f : SDNode<"AMDILISD::DP_TO_FP"   , SDTIL_DPToFPOp>;
+
+def IL_inttoany: SDNode<"AMDILISD::INTTOANY", SDTIL_IntToAny>;
+//===----------------------------------------------------------------------===//
+// Flow Control DAG Nodes
+//===----------------------------------------------------------------------===//
+def IL_brcond      : SDNode<"AMDILISD::BRANCH_COND", SDTIL_BRCond, [SDNPHasChain]>;
+
+//===----------------------------------------------------------------------===//
+// Comparison DAG Nodes
+//===----------------------------------------------------------------------===//
+def IL_cmp       : SDNode<"AMDILISD::CMP", SDTIL_Cmp>;
+
+//===----------------------------------------------------------------------===//
+// Call/Return DAG Nodes
+//===----------------------------------------------------------------------===//
+def IL_callseq_start : SDNode<"ISD::CALLSEQ_START", SDTIL_CallSeqStart,
+    [SDNPHasChain, SDNPOutGlue]>;
+def IL_callseq_end   : SDNode<"ISD::CALLSEQ_END",   SDTIL_CallSeqEnd,
+    [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;
+def IL_call      : SDNode<"AMDILISD::CALL", SDTIL_Call,
+    [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;
+
+def IL_retflag       : SDNode<"AMDILISD::RET_FLAG", SDTNone,
+    [SDNPHasChain, SDNPOptInGlue]>;
+
+//===----------------------------------------------------------------------===//
+// Arithmetic DAG Nodes
+//===----------------------------------------------------------------------===//
+// Address modification nodes
+def IL_addaddrri : SDNode<"AMDILISD::ADDADDR", SDTIL_AddAddrri,
+    [SDNPCommutative, SDNPAssociative]>;
+def IL_addaddrir : SDNode<"AMDILISD::ADDADDR", SDTIL_AddAddrir,
+    [SDNPCommutative, SDNPAssociative]>;
+
+//===--------------------------------------------------------------------===//
+// Instructions
+//===--------------------------------------------------------------------===//
+// Floating point math functions
+def IL_cmov_logical : SDNode<"AMDILISD::CMOVLOG", SDTIL_GenTernaryOp>;
+def IL_add       : SDNode<"AMDILISD::ADD"     , SDTIL_GenBinaryOp>;
+def IL_cmov        : SDNode<"AMDILISD::CMOV"    , SDTIL_GenBinaryOp>;
+def IL_or      : SDNode<"AMDILISD::OR"    ,SDTIL_GenBinaryOp>;
+def IL_and      : SDNode<"AMDILISD::AND"    ,SDTIL_GenBinaryOp>;
+def IL_xor          : SDNode<"AMDILISD::XOR", SDTIL_GenBinaryOp>;
+def IL_not          : SDNode<"AMDILISD::NOT", SDTIL_GenUnaryOp>;
+def IL_div_inf      : SDNode<"AMDILISD::DIV_INF", SDTIL_GenBinaryOp>;
+def IL_mad          : SDNode<"AMDILISD::MAD", SDTIL_GenTernaryOp>;
+
+//===----------------------------------------------------------------------===//
+// Integer functions
+//===----------------------------------------------------------------------===//
+def IL_inegate     : SDNode<"AMDILISD::INEGATE" , SDTIntUnaryOp>;
+def IL_umul        : SDNode<"AMDILISD::UMUL"    , SDTIntBinOp,
+    [SDNPCommutative, SDNPAssociative]>;
+def IL_mov        : SDNode<"AMDILISD::MOVE", SDTIL_GenUnaryOp>;
+def IL_phimov      : SDNode<"AMDILISD::PHIMOVE", SDTIL_GenUnaryOp>;
+def IL_bitconv     : SDNode<"AMDILISD::BITCONV", SDTIL_GenBitConv>;
+def IL_ffb_hi      : SDNode<"AMDILISD::IFFB_HI", SDTIL_GenUnaryOp>;
+def IL_ffb_lo      : SDNode<"AMDILISD::IFFB_LO", SDTIL_GenUnaryOp>;
+def IL_smax        : SDNode<"AMDILISD::SMAX", SDTIL_GenBinaryOp>;
+
+//===----------------------------------------------------------------------===//
+// Double functions
+//===----------------------------------------------------------------------===//
+def IL_dcreate     : SDNode<"AMDILISD::DCREATE"   , SDTIL_DCreate>;
+def IL_dcomphi     : SDNode<"AMDILISD::DCOMPHI"     , SDTIL_DComp>;
+def IL_dcomplo     : SDNode<"AMDILISD::DCOMPLO"     , SDTIL_DComp>;
+def IL_dcreate2     : SDNode<"AMDILISD::DCREATE2"   , SDTIL_DCreate2>;
+def IL_dcomphi2     : SDNode<"AMDILISD::DCOMPHI2"     , SDTIL_DComp2>;
+def IL_dcomplo2     : SDNode<"AMDILISD::DCOMPLO2"     , SDTIL_DComp2>;
+
+//===----------------------------------------------------------------------===//
+// Long functions
+//===----------------------------------------------------------------------===//
+def IL_lcreate     : SDNode<"AMDILISD::LCREATE"   , SDTIL_LCreate>;
+def IL_lcreate2    : SDNode<"AMDILISD::LCREATE2"   , SDTIL_LCreate2>;
+def IL_lcomphi     : SDNode<"AMDILISD::LCOMPHI"     , SDTIL_LComp>;
+def IL_lcomphi2    : SDNode<"AMDILISD::LCOMPHI2"     , SDTIL_LComp2>;
+def IL_lcomplo     : SDNode<"AMDILISD::LCOMPLO"     , SDTIL_LComp>;
+def IL_lcomplo2    : SDNode<"AMDILISD::LCOMPLO2"     , SDTIL_LComp2>;
+
+//===----------------------------------------------------------------------===//
+// Vector functions
+//===----------------------------------------------------------------------===//
+def IL_vbuild     : SDNode<"AMDILISD::VBUILD", SDTIL_GenVecBuild,
+    []>;
+def IL_vextract   : SDNode<"AMDILISD::VEXTRACT", SDTIL_GenVecExtract,
+    []>;
+def IL_vinsert    : SDNode<"AMDILISD::VINSERT", SDTIL_GenVecInsert,
+    []>;
+def IL_vconcat    : SDNode<"AMDILISD::VCONCAT", SDTIL_GenVecConcat,
+    []>;
+
+//===----------------------------------------------------------------------===//
+// AMDIL Image Custom SDNodes
+//===----------------------------------------------------------------------===//
+def image2d_read  : SDNode<"AMDILISD::IMAGE2D_READ", SDTIL_ImageRead,
+    [SDNPHasChain, SDNPMayLoad]>;
+def image2d_write : SDNode<"AMDILISD::IMAGE2D_WRITE", SDTIL_ImageWrite,
+    [SDNPHasChain, SDNPMayStore]>;
+def image2d_info0 : SDNode<"AMDILISD::IMAGE2D_INFO0", SDTIL_ImageInfo, []>;
+def image2d_info1 : SDNode<"AMDILISD::IMAGE2D_INFO1", SDTIL_ImageInfo, []>;
+def image3d_read  : SDNode<"AMDILISD::IMAGE3D_READ", SDTIL_ImageRead,
+    [SDNPHasChain, SDNPMayLoad]>;
+def image3d_write : SDNode<"AMDILISD::IMAGE3D_WRITE", SDTIL_ImageWrite3D,
+    [SDNPHasChain, SDNPMayStore]>;
+def image3d_info0 : SDNode<"AMDILISD::IMAGE3D_INFO0", SDTIL_ImageInfo, []>;
+def image3d_info1 : SDNode<"AMDILISD::IMAGE3D_INFO1", SDTIL_ImageInfo, []>;
+
+//===----------------------------------------------------------------------===//
+// AMDIL Atomic Custom SDNodes
+//===----------------------------------------------------------------------===//
+//===-------------- 32 bit global atomics with return values --------------===//
+def atom_g_add : SDNode<"AMDILISD::ATOM_G_ADD", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_and : SDNode<"AMDILISD::ATOM_G_AND", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_cmpxchg : SDNode<"AMDILISD::ATOM_G_CMPXCHG", SDTIL_TriAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>; 
+def atom_g_dec : SDNode<"AMDILISD::ATOM_G_DEC", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_inc : SDNode<"AMDILISD::ATOM_G_INC", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_max : SDNode<"AMDILISD::ATOM_G_MAX", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_umax : SDNode<"AMDILISD::ATOM_G_UMAX", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_min : SDNode<"AMDILISD::ATOM_G_MIN", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_umin : SDNode<"AMDILISD::ATOM_G_UMIN", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_or : SDNode<"AMDILISD::ATOM_G_OR", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_sub : SDNode<"AMDILISD::ATOM_G_SUB", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_rsub : SDNode<"AMDILISD::ATOM_G_RSUB", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_xchg : SDNode<"AMDILISD::ATOM_G_XCHG", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_xor : SDNode<"AMDILISD::ATOM_G_XOR", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+
+//===------------- 32 bit global atomics without return values ------------===//
+def atom_g_add_noret : SDNode<"AMDILISD::ATOM_G_ADD_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_and_noret : SDNode<"AMDILISD::ATOM_G_AND_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_cmpxchg_noret : SDNode<"AMDILISD::ATOM_G_CMPXCHG_NORET",
+    SDTIL_TriAtom, [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_cmp_noret : SDNode<"AMDILISD::ATOM_G_CMPXCHG_NORET",
+    SDTIL_TriAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_dec_noret : SDNode<"AMDILISD::ATOM_G_DEC_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_inc_noret : SDNode<"AMDILISD::ATOM_G_INC_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_max_noret : SDNode<"AMDILISD::ATOM_G_MAX_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_umax_noret: SDNode<"AMDILISD::ATOM_G_UMAX_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_min_noret : SDNode<"AMDILISD::ATOM_G_MIN_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_umin_noret: SDNode<"AMDILISD::ATOM_G_UMIN_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_or_noret : SDNode<"AMDILISD::ATOM_G_OR_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_sub_noret : SDNode<"AMDILISD::ATOM_G_SUB_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_rsub_noret : SDNode<"AMDILISD::ATOM_G_RSUB_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_xchg_noret: SDNode<"AMDILISD::ATOM_G_XCHG_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_g_xor_noret : SDNode<"AMDILISD::ATOM_G_XOR_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+
+//===--------------- 32 bit local atomics with return values --------------===//
+def atom_l_add : SDNode<"AMDILISD::ATOM_L_ADD", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_and : SDNode<"AMDILISD::ATOM_L_AND", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_cmpxchg : SDNode<"AMDILISD::ATOM_L_CMPXCHG", SDTIL_TriAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_dec : SDNode<"AMDILISD::ATOM_L_DEC", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_inc : SDNode<"AMDILISD::ATOM_L_INC", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_max : SDNode<"AMDILISD::ATOM_L_MAX", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_umax : SDNode<"AMDILISD::ATOM_L_UMAX", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_min : SDNode<"AMDILISD::ATOM_L_MIN", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_umin : SDNode<"AMDILISD::ATOM_L_UMIN", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_or : SDNode<"AMDILISD::ATOM_L_OR", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_mskor : SDNode<"AMDILISD::ATOM_L_MSKOR", SDTIL_TriAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_sub : SDNode<"AMDILISD::ATOM_L_SUB", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_rsub : SDNode<"AMDILISD::ATOM_L_RSUB", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_xchg : SDNode<"AMDILISD::ATOM_L_XCHG", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_xor : SDNode<"AMDILISD::ATOM_L_XOR", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+
+//===-------------- 32 bit local atomics without return values ------------===//
+def atom_l_add_noret : SDNode<"AMDILISD::ATOM_L_ADD_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_and_noret : SDNode<"AMDILISD::ATOM_L_AND_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_cmpxchg_noret : SDNode<"AMDILISD::ATOM_L_CMPXCHG_NORET",
+    SDTIL_TriAtom, [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_dec_noret : SDNode<"AMDILISD::ATOM_L_DEC_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_inc_noret : SDNode<"AMDILISD::ATOM_L_INC_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_max_noret : SDNode<"AMDILISD::ATOM_L_MAX_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_umax_noret: SDNode<"AMDILISD::ATOM_L_UMAX_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_min_noret : SDNode<"AMDILISD::ATOM_L_MIN_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_umin_noret: SDNode<"AMDILISD::ATOM_L_UMIN_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_or_noret : SDNode<"AMDILISD::ATOM_L_OR_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_mskor_noret : SDNode<"AMDILISD::ATOM_L_MSKOR_NORET",
+    SDTIL_TriAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_sub_noret : SDNode<"AMDILISD::ATOM_L_SUB_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_rsub_noret : SDNode<"AMDILISD::ATOM_L_RSUB_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_xchg_noret: SDNode<"AMDILISD::ATOM_L_XCHG_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_l_xor_noret : SDNode<"AMDILISD::ATOM_L_XOR_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+
+//===--------------- 32 bit local atomics with return values --------------===//
+def atom_r_add : SDNode<"AMDILISD::ATOM_R_ADD", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_and : SDNode<"AMDILISD::ATOM_R_AND", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_cmpxchg : SDNode<"AMDILISD::ATOM_R_CMPXCHG", SDTIL_TriAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_dec : SDNode<"AMDILISD::ATOM_R_DEC", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_inc : SDNode<"AMDILISD::ATOM_R_INC", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_max : SDNode<"AMDILISD::ATOM_R_MAX", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_umax : SDNode<"AMDILISD::ATOM_R_UMAX", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_min : SDNode<"AMDILISD::ATOM_R_MIN", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_umin : SDNode<"AMDILISD::ATOM_R_UMIN", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_or : SDNode<"AMDILISD::ATOM_R_OR", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_mskor : SDNode<"AMDILISD::ATOM_R_MSKOR", SDTIL_TriAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_sub : SDNode<"AMDILISD::ATOM_R_SUB", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_rsub : SDNode<"AMDILISD::ATOM_R_RSUB", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_xchg : SDNode<"AMDILISD::ATOM_R_XCHG", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_xor : SDNode<"AMDILISD::ATOM_R_XOR", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+
+//===-------------- 32 bit local atomics without return values ------------===//
+def atom_r_add_noret : SDNode<"AMDILISD::ATOM_R_ADD_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_and_noret : SDNode<"AMDILISD::ATOM_R_AND_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_cmpxchg_noret : SDNode<"AMDILISD::ATOM_R_CMPXCHG_NORET",
+    SDTIL_TriAtom, [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_dec_noret : SDNode<"AMDILISD::ATOM_R_DEC_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_inc_noret : SDNode<"AMDILISD::ATOM_R_INC_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_max_noret : SDNode<"AMDILISD::ATOM_R_MAX_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_umax_noret: SDNode<"AMDILISD::ATOM_R_UMAX_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_min_noret : SDNode<"AMDILISD::ATOM_R_MIN_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_umin_noret: SDNode<"AMDILISD::ATOM_R_UMIN_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_or_noret : SDNode<"AMDILISD::ATOM_R_OR_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_mskor_noret : SDNode<"AMDILISD::ATOM_R_MSKOR_NORET", SDTIL_TriAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_sub_noret : SDNode<"AMDILISD::ATOM_R_SUB_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_rsub_noret : SDNode<"AMDILISD::ATOM_R_RSUB_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_xchg_noret: SDNode<"AMDILISD::ATOM_R_XCHG_NORET",
+    SDTIL_BinAtom, [SDNPHasChain, SDNPMayLoad, SDNPMayStore, SDNPMemOperand]>;
+def atom_r_xor_noret : SDNode<"AMDILISD::ATOM_R_XOR_NORET", SDTIL_BinAtom,
+    [SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
+
+//===--------------- 32 bit atomic counter instructions -------------------===//
+def append_alloc : SDNode<"AMDILISD::APPEND_ALLOC", SDTIL_Append,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore]>;
+def append_consume : SDNode<"AMDILISD::APPEND_CONSUME", SDTIL_Append,
+    [SDNPHasChain, SDNPMayLoad, SDNPMayStore]>;
+def append_alloc_noret : SDNode<"AMDILISD::APPEND_ALLOC_NORET", SDTIL_Append,
+    [SDNPHasChain, SDNPMayStore]>;
+def append_consume_noret : SDNode<"AMDILISD::APPEND_CONSUME_NORET",
+    SDTIL_Append, [SDNPHasChain, SDNPMayStore]>;
--- a/src/gallium/drivers/radeon/AMDILOperands.td
+++ b/src/gallium/drivers/radeon/AMDILOperands.td
@ -0,0 +1,37 @@
+//===- AMDILOperands.td - AMD IL Operands ------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+//===----------------------------------------------------------------------===//
+// Custom memory operand
+//===----------------------------------------------------------------------===//
+
+def MEMI32  : Operand<i32> {
+    let PrintMethod = "printMemOperand";
+    let MIOperandInfo = (ops GPRI32, GPRI32);
+}
+
+def MEMI64 : Operand<i64> {
+  let PrintMethod = "printMemOperand";
+  let MIOperandInfo = (ops GPRI64, GPRI64);
+}
+
+// Call target types
+def calltarget   : Operand<i32>;
+def brtarget   : Operand<OtherVT>;
+
+// def v2i8imm : Operand<v2i8>;
+// def v4i8imm : Operand<v4i8>;
+// def v2i16imm : Operand<v2i16>;
+// def v4i16imm : Operand<v4i16>;
+// def v2i32imm : Operand<v2i32>;
+// def v4i32imm : Operand<v4i32>;
+// def v2i64imm : Operand<v2i64>;
+// def v2f32imm : Operand<v2f32>;
+// def v4f32imm : Operand<v4f32>;
+// def v2f64imm : Operand<v2f64>;
+
--- a/src/gallium/drivers/radeon/AMDILPatterns.td
+++ b/src/gallium/drivers/radeon/AMDILPatterns.td
@ -0,0 +1,504 @@
+//===- AMDILPatterns.td - AMDIL Target Patterns------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Store pattern fragments
+//===----------------------------------------------------------------------===//
+def truncstorei64 : PatFrag<(ops node:$val, node:$ptr),
+                           (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::i64;
+}]>;
+def truncstorev2i8 : PatFrag<(ops node:$val, node:$ptr),
+                           (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v2i8;
+}]>;
+def truncstorev2i16 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v2i16;
+}]>;
+def truncstorev2i32 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v2i32;
+}]>;
+def truncstorev2i64 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v2i64;
+}]>;
+def truncstorev2f32 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v2f32;
+}]>;
+def truncstorev2f64 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v2f64;
+}]>;
+def truncstorev4i8 : PatFrag<(ops node:$val, node:$ptr),
+                           (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v4i8;
+}]>;
+def truncstorev4i16 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v4i16;
+}]>;
+def truncstorev4i32 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v4i32;
+}]>;
+def truncstorev4f32 : PatFrag<(ops node:$val, node:$ptr),
+                            (truncstore node:$val, node:$ptr), [{
+  return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v4f32;
+}]>;
+
+def global_store : PatFrag<(ops node:$val, node:$ptr),
+    (store node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_store : PatFrag<(ops node:$val, node:$ptr),
+    (store node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_store : PatFrag<(ops node:$val, node:$ptr),
+    (store node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_store : PatFrag<(ops node:$val, node:$ptr),
+    (store node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei8 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei16 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei32 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei64 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref32 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref64 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v2i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i8 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v2i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i16 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v2i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i32 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v2i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i64 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v2f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f32 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v2f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f64 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v4i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i8 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v4i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i16 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v4i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i32 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def global_v4f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4f32 node:$val, node:$ptr), [{
+        return isGlobalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstore node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei8 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei16 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei32 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei64 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref32 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref64 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v2i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i8 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v2i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i16 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v2i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i32 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v2i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i64 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v2f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f32 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v2f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f64 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v4i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i8 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v4i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i16 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v4i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i32 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def private_v4f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4f32 node:$val, node:$ptr), [{
+        return isPrivateStore(dyn_cast<StoreSDNode>(N));
+}]>;
+
+def local_trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstore node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei8 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei16 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei32 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei64 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref32 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref64 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v2i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i8 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v2i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i16 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v2i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i32 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v2i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i64 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v2f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f32 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v2f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f64 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v4i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i8 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v4i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i16 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v4i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i32 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def local_v4f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4f32 node:$val, node:$ptr), [{
+        return isLocalStore(dyn_cast<StoreSDNode>(N));
+}]>;
+
+def region_trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstore node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei8 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei16 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei32 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorei64 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref32 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstoref64 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v2i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i8 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v2i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i16 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v2i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i32 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v2i64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2i64 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v2f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f32 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v2f64trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev2f64 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v4i8trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i8 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v4i16trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i16 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v4i32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4i32 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+def region_v4f32trunc_store : PatFrag<(ops node:$val, node:$ptr),
+    (truncstorev4f32 node:$val, node:$ptr), [{
+        return isRegionStore(dyn_cast<StoreSDNode>(N));
+}]>;
+
+//===----------------------------------------------------------------------===//
+// Load pattern fragments
+//===----------------------------------------------------------------------===//
+// Global address space loads
+def global_load : PatFrag<(ops node:$ptr), (load node:$ptr), [{
+    return isGlobalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def global_sext_load : PatFrag<(ops node:$ptr), (sextload node:$ptr), [{
+    return isGlobalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def global_aext_load : PatFrag<(ops node:$ptr), (zextload node:$ptr), [{
+    return isGlobalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def global_zext_load : PatFrag<(ops node:$ptr), (extload node:$ptr), [{
+    return isGlobalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+// Private address space loads
+def private_load : PatFrag<(ops node:$ptr), (load node:$ptr), [{
+    return isPrivateLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def private_sext_load : PatFrag<(ops node:$ptr), (sextload node:$ptr), [{
+    return isPrivateLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def private_aext_load : PatFrag<(ops node:$ptr), (zextload node:$ptr), [{
+    return isPrivateLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def private_zext_load : PatFrag<(ops node:$ptr), (extload node:$ptr), [{
+    return isPrivateLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+// Local address space loads
+def local_load : PatFrag<(ops node:$ptr), (load node:$ptr), [{
+    return isLocalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def local_sext_load : PatFrag<(ops node:$ptr), (sextload node:$ptr), [{
+    return isLocalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def local_aext_load : PatFrag<(ops node:$ptr), (zextload node:$ptr), [{
+    return isLocalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def local_zext_load : PatFrag<(ops node:$ptr), (extload node:$ptr), [{
+    return isLocalLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+// Region address space loads
+def region_load : PatFrag<(ops node:$ptr), (load node:$ptr), [{
+    return isRegionLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def region_sext_load : PatFrag<(ops node:$ptr), (sextload node:$ptr), [{
+    return isRegionLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def region_aext_load : PatFrag<(ops node:$ptr), (zextload node:$ptr), [{
+    return isRegionLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def region_zext_load : PatFrag<(ops node:$ptr), (extload node:$ptr), [{
+    return isRegionLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+// Constant address space loads
+def constant_load : PatFrag<(ops node:$ptr), (load node:$ptr), [{
+    return isConstantLoad(dyn_cast<LoadSDNode>(N), -1);
+}]>;
+def constant_sext_load : PatFrag<(ops node:$ptr), (sextload node:$ptr), [{
+    return isConstantLoad(dyn_cast<LoadSDNode>(N), -1);
+}]>;
+def constant_aext_load : PatFrag<(ops node:$ptr), (zextload node:$ptr), [{
+    return isConstantLoad(dyn_cast<LoadSDNode>(N), -1);
+}]>;
+def constant_zext_load : PatFrag<(ops node:$ptr), (extload node:$ptr), [{
+    return isConstantLoad(dyn_cast<LoadSDNode>(N), -1);
+}]>;
+// Constant pool loads
+def cp_load : PatFrag<(ops node:$ptr), (load node:$ptr), [{
+  return isCPLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def cp_sext_load : PatFrag<(ops node:$ptr), (sextload node:$ptr), [{
+  return isCPLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def cp_zext_load : PatFrag<(ops node:$ptr), (zextload node:$ptr), [{
+  return isCPLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+def cp_aext_load : PatFrag<(ops node:$ptr), (extload node:$ptr), [{
+  return isCPLoad(dyn_cast<LoadSDNode>(N));
+}]>;
+
+//===----------------------------------------------------------------------===//
+// Complex addressing mode patterns
+//===----------------------------------------------------------------------===//
+def ADDR : ComplexPattern<i32, 2, "SelectADDR", [], []>;
+def ADDRF : ComplexPattern<i32, 2, "SelectADDR", [frameindex], []>;
+def ADDR64 : ComplexPattern<i64, 2, "SelectADDR64", [], []>;
+def ADDR64F : ComplexPattern<i64, 2, "SelectADDR64", [frameindex], []>;
+
+
+//===----------------------------------------------------------------------===//
+// Conditional Instruction Pattern Leafs
+//===----------------------------------------------------------------------===//
+class IL_CC_Op<int N> : PatLeaf<(i32 N)>;
+def IL_CC_D_EQ  : IL_CC_Op<0>;
+def IL_CC_D_GE  : IL_CC_Op<1>;
+def IL_CC_D_LT  : IL_CC_Op<2>;
+def IL_CC_D_NE  : IL_CC_Op<3>;
+def IL_CC_F_EQ  : IL_CC_Op<4>;
+def IL_CC_F_GE  : IL_CC_Op<5>;
+def IL_CC_F_LT  : IL_CC_Op<6>;
+def IL_CC_F_NE  : IL_CC_Op<7>;
+def IL_CC_I_EQ  : IL_CC_Op<8>;
+def IL_CC_I_GE  : IL_CC_Op<9>;
+def IL_CC_I_LT  : IL_CC_Op<10>;
+def IL_CC_I_NE  : IL_CC_Op<11>;
+def IL_CC_U_GE  : IL_CC_Op<12>;
+def IL_CC_U_LT  : IL_CC_Op<13>;
+// Pseudo IL comparison instructions that aren't natively supported
+def IL_CC_F_GT  : IL_CC_Op<14>;
+def IL_CC_U_GT  : IL_CC_Op<15>;
+def IL_CC_I_GT  : IL_CC_Op<16>;
+def IL_CC_D_GT  : IL_CC_Op<17>;
+def IL_CC_F_LE  : IL_CC_Op<18>;
+def IL_CC_U_LE  : IL_CC_Op<19>;
+def IL_CC_I_LE  : IL_CC_Op<20>;
+def IL_CC_D_LE  : IL_CC_Op<21>;
+def IL_CC_F_UNE : IL_CC_Op<22>;
+def IL_CC_F_UEQ : IL_CC_Op<23>;
+def IL_CC_F_ULT : IL_CC_Op<24>;
+def IL_CC_F_UGT : IL_CC_Op<25>;
+def IL_CC_F_ULE : IL_CC_Op<26>;
+def IL_CC_F_UGE : IL_CC_Op<27>;
+def IL_CC_F_ONE : IL_CC_Op<28>;
+def IL_CC_F_OEQ : IL_CC_Op<29>;
+def IL_CC_F_OLT : IL_CC_Op<30>;
+def IL_CC_F_OGT : IL_CC_Op<31>;
+def IL_CC_F_OLE : IL_CC_Op<32>;
+def IL_CC_F_OGE : IL_CC_Op<33>;
+def IL_CC_D_UNE : IL_CC_Op<34>;
+def IL_CC_D_UEQ : IL_CC_Op<35>;
+def IL_CC_D_ULT : IL_CC_Op<36>;
+def IL_CC_D_UGT : IL_CC_Op<37>;
+def IL_CC_D_ULE : IL_CC_Op<38>;
+def IL_CC_D_UGE : IL_CC_Op<39>;
+def IL_CC_D_ONE : IL_CC_Op<30>;
+def IL_CC_D_OEQ : IL_CC_Op<41>;
+def IL_CC_D_OLT : IL_CC_Op<42>;
+def IL_CC_D_OGT : IL_CC_Op<43>;
+def IL_CC_D_OLE : IL_CC_Op<44>;
+def IL_CC_D_OGE : IL_CC_Op<45>;
+def IL_CC_U_EQ  : IL_CC_Op<46>;
+def IL_CC_U_NE  : IL_CC_Op<47>;
+def IL_CC_F_O   : IL_CC_Op<48>;
+def IL_CC_D_O   : IL_CC_Op<49>;
+def IL_CC_F_UO  : IL_CC_Op<50>;
+def IL_CC_D_UO  : IL_CC_Op<51>;
+def IL_CC_L_LE  : IL_CC_Op<52>;
+def IL_CC_L_GE  : IL_CC_Op<53>;
+def IL_CC_L_EQ  : IL_CC_Op<54>;
+def IL_CC_L_NE  : IL_CC_Op<55>;
+def IL_CC_L_LT  : IL_CC_Op<56>;
+def IL_CC_L_GT  : IL_CC_Op<57>;
+def IL_CC_UL_LE  : IL_CC_Op<58>;
+def IL_CC_UL_GE  : IL_CC_Op<59>;
+def IL_CC_UL_EQ  : IL_CC_Op<60>;
+def IL_CC_UL_NE  : IL_CC_Op<61>;
+def IL_CC_UL_LT  : IL_CC_Op<62>;
+def IL_CC_UL_GT  : IL_CC_Op<63>;
--- a/src/gallium/drivers/radeon/AMDILPeepholeOptimizer.cpp
+++ b/src/gallium/drivers/radeon/AMDILPeepholeOptimizer.cpp
--- a/src/gallium/drivers/radeon/AMDILPointerManager.cpp
+++ b/src/gallium/drivers/radeon/AMDILPointerManager.cpp
--- a/src/gallium/drivers/radeon/AMDILPointerManager.h
+++ b/src/gallium/drivers/radeon/AMDILPointerManager.h
@ -0,0 +1,209 @@
+//===-------- AMDILPointerManager.h - Manage Pointers for HW ------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+// The AMDIL Pointer Manager is a class that does all the checking for
+// different pointer characteristics. Pointers have attributes that need
+// to be attached to them in order to correctly codegen them efficiently.
+// This class will analyze the pointers of a function and then traverse the uses
+// of the pointers and determine if a pointer can be cached, should belong in
+// the arena, and what UAV it should belong to. There are seperate classes for
+// each unique generation of devices. This pass only works in SSA form.
+//===----------------------------------------------------------------------===//
+#ifndef _AMDIL_POINTER_MANAGER_H_
+#define _AMDIL_POINTER_MANAGER_H_
+#undef DEBUG_TYPE
+#undef DEBUGME
+#define DEBUG_TYPE "PointerManager"
+#if !defined(NDEBUG)
+#define DEBUGME (DebugFlag && isCurrentDebugType(DEBUG_TYPE))
+#else
+#define DEBUGME (false)
+#endif
+#include "AMDIL.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/CodeGen/MachineFunctionAnalysis.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineMemOperand.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/Support/Compiler.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Target/TargetMachine.h"
+
+#include <list>
+#include <map>
+#include <queue>
+#include <set>
+
+namespace llvm {
+  class Value;
+  class MachineBasicBlock;
+  // Typedefing the multiple different set types to that it is
+  // easier to read what each set is supposed to handle. This
+  // also allows it easier to track which set goes to which
+  // argument in a function call.
+  typedef std::set<const Value*> PtrSet;
+
+  // A Byte set is the set of all base pointers that must
+  // be allocated to the arena path.
+  typedef PtrSet ByteSet;
+
+  // A Raw set is the set of all base pointers that can be
+  // allocated to the raw path.
+  typedef PtrSet RawSet;
+
+  // A cacheable set is the set of all base pointers that
+  // are deamed cacheable based on annotations or
+  // compiler options.
+  typedef PtrSet CacheableSet;
+
+  // A conflict set is a set of all base pointers whose 
+  // use/def chains conflict with another base pointer.
+  typedef PtrSet ConflictSet;
+
+  // An image set is a set of all read/write only image pointers.
+  typedef PtrSet ImageSet;
+
+  // An append set is a set of atomic counter base pointers
+  typedef std::vector<const Value*> AppendSet;
+
+  // A ConstantSet is a set of constant pool instructions
+  typedef std::set<MachineInstr*> CPoolSet;
+
+  // A CacheableInstSet set is a set of instructions that are cachable
+  // even if the pointer is not generally cacheable.
+  typedef std::set<MachineInstr*> CacheableInstrSet;
+
+  // A pair that maps a virtual register to the equivalent base
+  // pointer value that it was derived from.
+  typedef std::pair<unsigned, const Value*> RegValPair;
+
+  // A map that maps between the base pointe rvalue and an array
+  // of instructions that are part of the pointer chain. A pointer
+  // chain is a recursive def/use chain of all instructions that don't
+  // store data to memory unless the pointer is the data being stored.
+  typedef std::map<const Value*, std::vector<MachineInstr*> > PtrIMap;
+
+  // A map that holds a set of all base pointers that are used in a machine
+  // instruction. This helps to detect when conflict pointers are found
+  // such as when pointer subtraction occurs.
+  typedef std::map<MachineInstr*, PtrSet> InstPMap;
+
+  // A map that holds the frame index to RegValPair so that writes of 
+  // pointers to the stack can be tracked.
+  typedef std::map<unsigned, RegValPair > FIPMap;
+
+  // A small vector impl that holds all of the register to base pointer 
+  // mappings for a given function.
+  typedef std::map<unsigned, RegValPair> RVPVec;
+
+
+
+  // The default pointer manager. This handles pointer 
+  // resource allocation for default ID's only. 
+  // There is no special processing.
+  class AMDILPointerManager : public MachineFunctionPass
+  {
+    public:
+      AMDILPointerManager(
+          TargetMachine &tm
+          AMDIL_OPT_LEVEL_DECL);
+      virtual ~AMDILPointerManager();
+      virtual const char*
+        getPassName() const;
+      virtual bool
+        runOnMachineFunction(MachineFunction &F);
+      virtual void
+        getAnalysisUsage(AnalysisUsage &AU) const;
+      static char ID;
+    protected:
+      bool mDebug;
+    private:
+      TargetMachine &TM;
+  }; // class AMDILPointerManager
+
+  // The pointer manager for Evergreen and Northern Island
+  // devices. This pointer manager allocates and trackes
+  // cached memory, arena resources, raw resources and
+  // whether multi-uav is utilized or not.
+  class AMDILEGPointerManager : public AMDILPointerManager
+  {
+    public:
+      AMDILEGPointerManager(
+          TargetMachine &tm
+          AMDIL_OPT_LEVEL_DECL);
+      virtual ~AMDILEGPointerManager();
+      virtual const char*
+        getPassName() const;
+      virtual bool
+        runOnMachineFunction(MachineFunction &F);
+    private:
+      TargetMachine &TM;
+  }; // class AMDILEGPointerManager
+
+  // Information related to the cacheability of instructions in a basic block.
+  // This is used during the parse phase of the pointer algorithm to track
+  // the reachability of stores within a basic block.
+  class BlockCacheableInfo {
+    public:
+      BlockCacheableInfo() :
+        mStoreReachesTop(false),
+        mStoreReachesExit(false),
+        mCacheableSet()
+    {};
+
+      bool storeReachesTop() const  { return mStoreReachesTop; }
+      bool storeReachesExit() const { return mStoreReachesExit; }
+      CacheableInstrSet::const_iterator 
+        cacheableBegin() const { return mCacheableSet.begin(); }
+      CacheableInstrSet::const_iterator 
+        cacheableEnd()   const { return mCacheableSet.end(); }
+
+      // mark the block as having a global store that reaches it. This
+      // will also set the store reaches exit flag, and clear the list
+      // of loads (since they are now reachable by a store.)
+      bool setReachesTop() {
+        bool changedExit = !mStoreReachesExit;
+
+        if (!mStoreReachesTop)
+          mCacheableSet.clear();
+
+        mStoreReachesTop = true;
+        mStoreReachesExit = true;
+        return changedExit;
+      }
+
+      // Mark the block as having a store that reaches the exit of the 
+      // block.
+      void setReachesExit() {
+        mStoreReachesExit = true;
+      }
+
+      // If the top or the exit of the block are not marked as reachable
+      // by a store, add the load to the list of cacheable loads.
+      void addPossiblyCacheableInst(const TargetMachine * tm, MachineInstr *load) {
+        // By definition, if store reaches top, then store reaches exit.
+        // So, we only test for exit here.
+        // If we have a volatile load we cannot cache it.
+        if (mStoreReachesExit || isVolatileInst(tm->getInstrInfo(), load)) {
+          return;
+        }
+
+        mCacheableSet.insert(load);
+      }
+
+    private:
+      bool mStoreReachesTop; // Does a global store reach the top of this block?
+      bool mStoreReachesExit;// Does a global store reach the exit of this block?
+      CacheableInstrSet mCacheableSet; // The set of loads in the block not 
+      // reachable by a global store.
+  };
+  // Map from MachineBasicBlock to it's cacheable load info.
+  typedef std::map<MachineBasicBlock*, BlockCacheableInfo> MBBCacheableMap;
+} // end llvm namespace
+#endif // _AMDIL_POINTER_MANAGER_H_
--- a/src/gallium/drivers/radeon/AMDILPrintfConvert.cpp
+++ b/src/gallium/drivers/radeon/AMDILPrintfConvert.cpp
@ -0,0 +1,293 @@
+//===-- AMDILPrintfConvert.cpp - Printf Conversion pass --===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//==-----------------------------------------------------------------------===//
+
+#define DEBUG_TYPE "PrintfConvert"
+#ifdef DEBUG
+#define DEBUGME (DebugFlag && isCurrentDebugType(DEBUG_TYPE))
+#else
+#define DEBUGME 0
+#endif
+
+#include "AMDILAlgorithms.tpp"
+#include "AMDILKernelManager.h"
+#include "AMDILMachineFunctionInfo.h"
+#include "AMDILModuleInfo.h"
+#include "AMDILTargetMachine.h"
+#include "AMDILUtilityFunctions.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionAnalysis.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/GlobalVariable.h"
+#include "llvm/Instructions.h"
+#include "llvm/Module.h"
+#include "llvm/Type.h"
+
+#include <cstdio>
+
+using namespace llvm;
+namespace
+{
+    class LLVM_LIBRARY_VISIBILITY AMDILPrintfConvert : public FunctionPass
+    {
+        public:
+            TargetMachine &TM;
+            static char ID;
+            AMDILPrintfConvert(TargetMachine &tm AMDIL_OPT_LEVEL_DECL);
+            ~AMDILPrintfConvert();
+            const char* getPassName() const;
+            bool runOnFunction(Function &F);
+            bool doInitialization(Module &M);
+            bool doFinalization(Module &M);
+            void getAnalysisUsage(AnalysisUsage &AU) const;
+
+        private:
+            bool expandPrintf(BasicBlock::iterator *bbb);
+            AMDILMachineFunctionInfo *mMFI;
+            AMDILKernelManager *mKM;
+            bool mChanged;
+            SmallVector<int64_t, DEFAULT_VEC_SLOTS> bVecMap;
+    };
+    char AMDILPrintfConvert::ID = 0;
+} // anonymouse namespace
+
+namespace llvm
+{
+    FunctionPass*
+        createAMDILPrintfConvert(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+        {
+            return new AMDILPrintfConvert(tm AMDIL_OPT_LEVEL_VAR);
+        }
+} // llvm namespace
+AMDILPrintfConvert::AMDILPrintfConvert(TargetMachine &tm AMDIL_OPT_LEVEL_DECL)
+    : FunctionPass(ID), TM(tm)
+{
+}
+AMDILPrintfConvert::~AMDILPrintfConvert()
+{
+}
+    bool
+AMDILPrintfConvert::expandPrintf(BasicBlock::iterator *bbb)
+{
+    Instruction *inst = (*bbb);
+    CallInst *CI = dyn_cast<CallInst>(inst);
+    if (!CI) {
+        return false;
+    }
+    int num_ops = CI->getNumOperands();
+    if (!num_ops) {
+        return false;
+    }
+    if (CI->getOperand(num_ops - 1)->getName() != "printf") {
+        return false;
+    }
+
+    Function *mF = inst->getParent()->getParent();
+    uint64_t bytes = 0;
+    mChanged = true;
+    if (num_ops == 1) {
+        ++(*bbb);
+        Constant *newConst = ConstantInt::getSigned(CI->getType(), bytes);
+        CI->replaceAllUsesWith(newConst);
+        CI->eraseFromParent();
+        return mChanged;
+    }
+    // Deal with the string here
+    Value *op = CI->getOperand(0);
+    ConstantExpr *GEPinst = dyn_cast<ConstantExpr>(op);
+    if (GEPinst) {
+        GlobalVariable *GVar
+            = dyn_cast<GlobalVariable>(GEPinst->getOperand(0));
+        std::string str = "unknown";
+        if (GVar && GVar->hasInitializer()) {
+          ConstantDataArray *CA
+              = dyn_cast<ConstantDataArray>(GVar->getInitializer());
+          str = (CA->isString() ? CA->getAsString() : "unknown");
+        }
+        uint64_t id = (uint64_t)mMFI->addPrintfString(str, 
+            getAnalysis<MachineFunctionAnalysis>().getMF()
+            .getMMI().getObjFileInfo<AMDILModuleInfo>().get_printf_offset());
+        std::string name = "___dumpStringID";
+        Function *nF = NULL;
+        std::vector<Type*> types;
+        types.push_back(Type::getInt32Ty(mF->getContext()));
+        nF = mF->getParent()->getFunction(name);
+        if (!nF) {
+            nF = Function::Create(
+                    FunctionType::get(
+                        Type::getVoidTy(mF->getContext()), types, false),
+                    GlobalValue::ExternalLinkage,
+                    name, mF->getParent());
+        }
+        Constant *C = ConstantInt::get(
+                Type::getInt32Ty(mF->getContext()), id, false);
+        CallInst *nCI = CallInst::Create(nF, C);
+        nCI->insertBefore(CI);
+        bytes = strlen(str.data());
+        for (uint32_t x = 1, y = num_ops - 1; x < y; ++x) {
+            op = CI->getOperand(x);
+            Type *oType = op->getType();
+            uint32_t eleCount = getNumElements(oType);
+            uint32_t eleSize = (uint32_t)GET_SCALAR_SIZE(oType);
+            if (!eleSize) {
+              // Default size is 32bits.
+              eleSize = 32;
+            }
+            if (!eleCount) {
+              // Default num elements is 1.
+              eleCount = 1;
+            }
+            uint32_t totalSize = eleCount * eleSize;
+            mMFI->addPrintfOperand(str, (x - 1),
+                    (uint32_t)totalSize);
+        }
+    }
+    for (uint32_t x = 1, y = num_ops - 1; x < y; ++x) {
+        op = CI->getOperand(x);
+        Type *oType = op->getType();
+        if (oType->isFPOrFPVectorTy()
+                && (oType->getTypeID() != Type::VectorTyID)) {
+            Type *iType = NULL;
+            if (oType->isFloatTy()) {
+                iType = dyn_cast<Type>(
+                        Type::getInt32Ty(oType->getContext()));
+            } else {
+                iType = dyn_cast<Type>(
+                        Type::getInt64Ty(oType->getContext()));
+            }
+            op = new BitCastInst(op, iType, "printfBitCast", CI);
+        } else if (oType->getTypeID() == Type::VectorTyID) {
+            Type *iType = NULL;
+            uint32_t eleCount = getNumElements(oType);
+            uint32_t eleSize = (uint32_t)GET_SCALAR_SIZE(oType);
+            uint32_t totalSize = eleCount * eleSize;
+            switch (eleSize) {
+                default:
+                    eleCount = totalSize / 64;
+                    iType = dyn_cast<Type>(
+                            Type::getInt64Ty(oType->getContext()));
+                    break;
+                case 8:
+                    if (eleCount >= 8) {
+                        eleCount = totalSize / 64;
+                        iType = dyn_cast<Type>(
+                                Type::getInt64Ty(oType->getContext()));
+                    } else if (eleCount >= 4) {
+                        eleCount = 1;
+                        iType = dyn_cast<Type>(
+                                Type::getInt32Ty(oType->getContext()));
+                    } else {
+                        eleCount = 1;
+                        iType = dyn_cast<Type>(
+                                Type::getInt16Ty(oType->getContext()));
+                    }
+                    break;
+                case 16:
+                    if (eleCount >= 4) {
+                        eleCount = totalSize / 64;
+                        iType = dyn_cast<Type>(
+                                Type::getInt64Ty(oType->getContext()));
+                    } else {
+                        eleCount = 1;
+                        iType = dyn_cast<Type>(
+                                Type::getInt32Ty(oType->getContext()));
+                    }
+                    break;
+            }
+            if (eleCount > 1) {
+                iType = dyn_cast<Type>(
+                        VectorType::get(iType, eleCount));
+            }
+            op = new BitCastInst(op, iType, "printfBitCast", CI);
+        }
+        char buffer[256];
+        uint32_t size = (uint32_t)GET_SCALAR_SIZE(oType);
+        if (size) {
+            sprintf(buffer, "___dumpBytes_v%db%u",
+                    1,
+                    (uint32_t)getNumElements(oType) * (uint32_t)size);
+        } else {
+            const PointerType *PT = dyn_cast<PointerType>(oType);
+            if (PT->getAddressSpace() == 0 &&
+                    GET_SCALAR_SIZE(PT->getContainedType(0)) == 8
+                    && getNumElements(PT->getContainedType(0)) == 1) {
+                op = new BitCastInst(op,
+                        Type::getInt8PtrTy(oType->getContext(),
+                            AMDILAS::CONSTANT_ADDRESS),
+                        "printfPtrCast", CI);
+
+                sprintf(buffer, "___dumpBytes_v%dbs", 1);
+            } else {
+                op = new PtrToIntInst(op,
+                        Type::getInt32Ty(oType->getContext()),
+                        "printfPtrCast", CI);
+                sprintf(buffer, "___dumpBytes_v1b32");
+            }
+        }
+        std::vector<Type*> types;
+        types.push_back(op->getType());
+        std::string name = buffer;
+        Function *nF = NULL;
+        nF = mF->getParent()->getFunction(name);
+        if (!nF) {
+            nF = Function::Create(
+                    FunctionType::get(
+                        Type::getVoidTy(mF->getContext()), types, false),
+                    GlobalValue::ExternalLinkage,
+                    name, mF->getParent());
+        }
+        CallInst *nCI = CallInst::Create(nF, op);
+        nCI->insertBefore(CI);
+        bytes += (size - 4);
+    }
+    ++(*bbb);
+    Constant *newConst = ConstantInt::getSigned(CI->getType(), bytes);
+    CI->replaceAllUsesWith(newConst);
+    CI->eraseFromParent();
+    return mChanged;
+}
+    bool
+AMDILPrintfConvert::runOnFunction(Function &MF)
+{
+    mChanged = false;
+    mKM = TM.getSubtarget<AMDILSubtarget>().getKernelManager();
+    mMFI = getAnalysis<MachineFunctionAnalysis>().getMF()
+          .getInfo<AMDILMachineFunctionInfo>();
+    bVecMap.clear();
+    safeNestedForEach(MF.begin(), MF.end(), MF.begin()->begin(),
+            std::bind1st(
+                std::mem_fun(
+                    &AMDILPrintfConvert::expandPrintf), this));
+    return mChanged;
+}
+
+const char*
+AMDILPrintfConvert::getPassName() const
+{
+    return "AMDIL Printf Conversion Pass";
+}
+bool
+AMDILPrintfConvert::doInitialization(Module &M)
+{
+    return false;
+}
+
+bool
+AMDILPrintfConvert::doFinalization(Module &M)
+{
+    return false;
+}
+
+void
+AMDILPrintfConvert::getAnalysisUsage(AnalysisUsage &AU) const
+{
+  AU.addRequired<MachineFunctionAnalysis>();
+  FunctionPass::getAnalysisUsage(AU);
+  AU.setPreservesAll();
+}
--- a/Show More
+++ b/Show More