You can get rid of your dependent nested loop, but at a cost of waste in memory (more so than usual in vectorization problems). You're addressing a small subset of a 3d box in your for loop. If you don't mind generating (n+1)^3 items for using only (n+1)(n+2)(n+3)/6 ones, and you can fit into memory, the vectorized version will indeed be likely much faster.
My suggestion, explanation below:
import numpy as np
def tet_vect(n):
xv = np.array([
[-1.,-1.,-1.],
[ 1.,-1.,-1.],
[-1., 1.,-1.],
[-1.,-1., 1.],
])
# spanning arrays of a 3d grid according to range(0,n+1)
ii,jj,kk = np.ogrid[:n+1,:n+1,:n+1]
# indices of the triples which fall inside the original for loop
inds = (jj < n+1-ii) & (kk < n+1-ii-jj)
# the [i,j,k] indices of the points that fall inside the for loop, in the same order
combs = np.vstack(np.where(inds)).T
# combs is now an (nsize,3)-shaped array
# compute "l" column too
lcol = n - combs.sum(axis=1)
combs = np.hstack((combs,lcol[:,None]))
# combs is now an (nsize,4)-shaped array
# all we need to do now is to take the matrix product of combs and xv, divide by n in the end
xg = np.matmul(combs,xv)/n
return xg
The key step is generating the ii,jj,kk indices which span a 3d grid. This step is memory-efficient, since np.ogrid creates spanning arrays that can be used in broadcasting operations to refer to your full grid. We only generate a full (n+1)^3-sized array in the next step: the inds boolean array selects those points in the 3d box that lie inside your region of interest (and it does so by making use of array broadcasting). In the following step np.where(inds) picks out the truthy elements of this large array, and we end up with the smaller number of nsize elements. The single memory-wasting step is thus the creation of inds.
The rest is straightforward: we need to generate an additional column for the array that contains the [i,j,k] indices in each row, this can be done by summing over the columns of the array (again a vectorized operation). Once we have the (nsize,4)-shaped auxiliary array that contains each (i,j,k,l) in its rows: we need to perform a matrix multiplication of this object with xv, and we're done.
Testing with small n sizes suggests that the above function produces the same results as yours. Timing for n = 100: 1.15 s original, 19 ms vectorized.